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Chapter  1 
Introduction 


1.1  What  is  Combinatorial  Optimization? 

Graphs 

Most  combinatorial  optimization  problems  are  defined  on  graphs.  We  assume  some 
familiarity  with  this  subject  and  give  only  a  short  introduction.  Most  of  the  books 
referenced  at  the  end  of  these  notes  also  contain  introductory  sections  on  graphs. 

An  (undirected)  graph  G  is  a  pair  (JEJ,  V),  where  F  is  a  finite  set  of  vertices  and  E 
is  a  finite  set  of  edges.  Each  edge  has  associated  with  it  two  vertices,  not  necessarily 
distinct,  called  its  ends.  An  edge  with  identical  ends  is  a  loop.  Two  non-loop  edges 
with  the  same  ends  axe  said  to  be  parallel.  If  G  has  no  loops  or  parallel  edges  it  is 
simple. 

It  is  typical  to  draw  pictures  of  graphs  in  which  vertices  are  depicted  as  points 
and  edges  as  line  segments  joining  these  points.  The  graph  pictured  in  Figure  1.1 
has  4  vertices,  6  edges,  one  loop,  and  two  parallel  edges.  There  is  one  labeled  edge, 
e,  and  there  are  two  labeled  vertices,  x  and  y,  the  ends  of  e.  When  there  is  no 
ambiguity,  as  there  is  not  for  a  simple  graph,  we  write  e  =  xy. 


e 

.i - b 

Figure  1.1:  An  example  graph. 
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A  path  P  from  a  vertex  vq  to  a  vertex  Vk  in  a  graph  G,  sometimes  called  a  VQ-Vk 
path,  is  a  sequence  of  vertices  and  edges  of  G,  P  =  (uo,  ci,  ui,  62,  U2,  •  •  •,  Vk-i, 
Cfc,  Vk),  such  that  (i  =  When  no  ambiguity  arises  we  write 

P  =  (uq,  . . . ,  Vk).  If  Vo, ... ,  Vk  are  distinct,  the  path  is  called  simple.  A  path  that 
is  simple  except  that  vq  =  Vk  is  called  a  circuit.  The  graph  in  Figure  1.1  contains 
exactly  four  distinct  circuits.  Note  that  a  loop  ia  a  circuit  with  exactly  one  edge 
(and  one  vertex).  A  pair  of  parallel  edges  also  forms  a  circuit. 

A  graph  is  connected  if  for  every  pair  of  vertices  x  and  y,  there  is  a  path  from 
X  to  y.  A  graph  is  a  forest  if  it  includes  no  circuits.  A  connected  forest  is  a  tree. 
A  subgraph  of  a  given  graph  is  obtained  by  deleting  some  of  its  vertices  and  edges. 
Of  course,  when  a  vertex  is  deleted  from  a  graph,  then  all  incident  edges  must  also 
be  deleted.  An  edge  can  be  deleted  without  deleting  its  end-vertices.  A  spanning 
subgraph  of  a  graph  is  one  in  which  only  edges  have  been  deleted.  A  spanning  tree  is 
a  spanning  subgraph  that  is  a  tree.  The  graph  in  Figure  1.1  has  exactly  7  spanning 
trees. 

A  Definition  of  Combinatorial  Optimization 

A  combinatorial  optimization  problem  can  be  defined  in  general  as  follows. 

Definition  1.1.1  Let  E  he  a.  finite  set,  let  5  be  a  feimily  of  subsets  of  E,  and 
let  w  €  be  a  real- valued  weight  function  defined  on  the  elements  of  E.  The 
associated  combinatorial  optimization  (CO)  problem  is  to  find  S*  €.  S  such  that 

w(S*)  =  mini(;(5), 

where  w{S)  :=  Eegs  ^(e)-  Q 

Example  1.1.2  Traveling  Salesman  Problem.  Let  Kn  denote  the  complete  graph 
on  n  vertices.  Thus,  Kn  is  a  simple  graph  on  n  vertices  in  which  every  two  vertices 
are  joined  by  an  edge.  Let  tx;  be  a  weight  function  defined  on  the  edges  of  Kn.  In 
the  typical  traveling-salesman  interpretation  the  vertices  of  Kn  represent  cities,  and 
the  weights  distances  between  these  cities.  Of  course,  other  interpretations  are  also 
possible,  and  important. 

The  n-city  traveling  salesman  problem  is  to  construct  a  circuit  that  passes 
through  each  vertex,  and  has  minimum  total  weight.  A  circuit  passing  through 
each  vertex  of  a  graph  is  called  a  tour.  In  the  graph  K4  given  in  Figure  1.2,  and  for 
the  weight  function  indicated  on  the  edges,  the  minimum  tour  has  weight  8. 

To  formulate  the  TSP  as  a  CO  problem,  take  5  to  be  the  set  of  edges  of  Kn  (so 
that  15|  =  n(n  -  l)/2),  and  let  S  be  the  family  of  subsets  of  5  forming  tours.  The 
TSP  is  an  example  of  a  hard  CO  problem.  It  is  known  to  be  A/''P-complete:  No 
polynomial-time  algorithm  is  known  for  it,  and  none  is  expected.  [] 
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Figure  1.2:  K4 


Example  1.1.3  Minimum  Spanning  Trees.  Let  G  be  a  connected  graph.  Given  a 
weight  function  defined  on  the  edges  of  G,  the  minimum  spanning  tree  problem  is 
to  find  a  spanning  tree  of  G  that  has  minimum  total  weight.  This  is  clearly  a  CO 
problem. 

The  MST  problem  has  numerous  applications.  One  simple  one  is  the  following. 
Suppose  a  collection  of  remote  computer  installations  has  been  specified,  and  a  cost 
is  given  for  connecting  each  separate  pair  of  terminals  by  a  direct  communication 
link.  It  is  reasonable  to  ask  for  a  minimum-cost  collection  of  links  the  construction  of 
which  would  allow  any  one  terminal  to  communicate  with  any  other,  communication 
being  possible  exactly  when  there  is  path  of  links  between  the  two  terminals  in 
question.  Given  that  the  costs  are  nonnegative  (and  who  would  doubt  that  they 
are),  this  is  exactly  a  MST  problem. 

The  MST  problem  is  an  example  of  an  easy  CO  problem.  There  are  several 
polynomial-time  algorithms  known  to  solve  it.  [] 


1.2  Independence  Systems 

In  this  section  we  develop  a  general  algorithm  that  can  be  applied  to  a  wide  variety 
of  CO  problems.  For  most,  it  is  only  a  heuristic,  but  for  the  MST  problem  we  will 
see  that  it  gives  an  exact  solution. 

Definition  1.2.1  Let  E  be  a  finite  set,  and  let  X  be  a  family  of  independent  subsets 
of  E.  The  pair  (E,  T)  is  called  an  independence  system  if 

(11)  the  empty  set  is  independent,  and 

(12)  subsets  of  independent  sets  are  independent.  [] 

Example  1.2.2  Both  the  TSP  and  the  MST  problem  can  be  formulated  as  CO 
problems  on  independence  systems.  For  example,  in  the  MST  problem  one  can 
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take  as  the  tinderlying  set  E  the  set  of  edges  of  the  given  graph  G,  and  as  the 
family  T  the  family  of  edge-sets  of  forests  of  G.  The  MST  problem  for  G  and 
a  weight  function  w  is  then  equivalent  to  finding  a  maximum-weight  independent 
subset  of  E  with  respect  to  weight  function  {M  —  w(e)  :  e  G  E}^  for  a  sufficiently 
large  constant  M. 

The  TSP  may  be  handled  similarly.  [] 

The  following  algorithm,  named  by  Jack  Edmonds,  can  be  applied  to  any  inde¬ 
pendence  system. 

Algorithm  1.2.3  The  Greedy  Algorithm. 

Input:  An  independence  system  (£,I),  a  weight  function  w  G  and  an  indepen¬ 
dence  oracle. 

Output:  An  independent  set  Ig  (of  hopefully  large  total  weight). 

Comment:  In  order  to  apply  an  algorithm  to  J  must  be  specified  in  some 

computationally  accessible  form.  The  typically  assumed  form  is  that  of  an  indepen¬ 
dence  oracle.  Thus,  it  is  assumed  that  a  subroutine  or  “oracle”  is  available  that  can 
determine  in  constant  time  if  a  given  X  C  .B  is  independent.  In  applications  this 
oracle  is  replaced  by  a  concrete  calculation. 

begin 

sort  B  as  Cl, ... , e\E\  so  that  for  some  k 

U7(ei)  >  . . .  >  ty(cfc)  >  0  >  t«(efc4.i)  >  . . .  >  w(e\E\); 

Ig  ;=  0; 

for  j  ;=  1  until  k  do 

if  Ig  U  {Cj}  G  I  then  Ig  :=  Ig  U  {cj}; 

end 

Note  that  this  algorithm  requires  at  most  |B|  calls  to  the  independence  oracle; 
moreover,  for  any  reasonable  implementation,  the  time  requirements  in  addition  to 
these  calls  are  bounded  by  a  polynomial  in  1B|.  The  algorithm  is  thus  said  to  run 
in  oracle  polynomial  time.  More  detailed  estimates  for  concrete  instances  will  be 
given  later. 

For  X  C  E,  &  base  B  of  X  is  a  maximal  independent  subset  of  X,  where  B 
maximal  independent  means  that  for  every  e  6  X\B,  B  \J  {e}  is  not  independent. 
The  rank  of  X,  r(X),  is  defined  by 

r(X)  =  max{|B|  :  B  a  base  of  X}. 

The  lower  rank  of  X,  ri(X),  is  defined  by 

r/(X)  =  min{|5|  :  B  a  base  of  X}. 
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In  terms  of  these  quantities,  we  have  the  following  general  performance  bound 
on  the  greedy  algorithm. 


Theorem  1.2.4  (Jenkyns  1976)  Where  lo  is  an  optimal-weight  independent  set  and 
w{Io)  >  0, 


:  P  e  #  0}  < 


r(F) 


w{Io) 


<  1. 


Proof.  Let  lu,-  denote  and  write  Ei  =  {i  =  Define 

lOfc+i  =  0.  Then  we  have 


k 

=  E  14  n  Ej\{wj  - 

i=i 

and 

k 

HQ  =  I]  14  n  Ej\{wj  - 

j=i 

But  for  each  j,  by  the  nature  of  the  greedy  algorithm,  Ig  D  Ej  is  a  maximal  indepen¬ 
dent  subset  of  Ej.  Hence,  ri{Ej)  <  {IgDEj]',  moreover,  \IoC\Ej\  <  r(Ej)  since  IgDEj 
is  independent  (being  a  subset  of  lo)  and  r(Ej)  is  the  size  of  a  biggest  independent 
subset  of  Ej.  Combining  the  above  facts,  and  denoting  the  ‘min’  in  the  theorem  by 
5,  we  have 


W(lg)  > 
> 


> 


k 

Y^ri{Ej){wj  -  Wj+i) 
j=l 
k 

Y,(ir{Ej){wj-Wj^i) 

i=i 

k 

-  Wj+l) 

j=l 


as  required.  [] 

Corollary  1.2.5  If  r{F)  =  ri{F)  for  all  F  C  E,  then  the  greedy  algorithm  gives  an 
optimal  solution  of  the  maximum  independent  set  problem.  [] 

The  objects  singled  out  by  (1.2.5)  are  known  in  the  literature  as  “matroids”  (see 
[2]).  We  will  not  study  their  general  properties  further  here. 
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Exercises 

1.1  Let  I  be  the  family  of  subsets  of  edge-sets  of  tours  in  Kn-  Define  the  rank  r  and  lower 
rank  r/  for  I  as  in  §1.2.  Let 

q  =  min{^  :  F  C  EiK^,)  and  r(F)  ^  0}. 

Show  that  q  >  1/2. 


Chapter  2 


Minimum  Spanning  Trees 


2.1  The  Greedy  Algorithm 

An  interesting  overview  of  the  subject  of  minimum  spanning  trees  is  given  in  “On 
the  history  of  the  minimum  spanning  tree  problem,”  by  R.  L.  Graham  auid  P.  Hell 
(1985),  Annals  of  the  History  of  Computing  7  43-57. 

There  axe  several  known  polynomial-time  algorithms  for  the  MST  problem.  We 
begin  by  showing  that  the  greedy  algorithm  (Algorithm  1.2.3)  is  one  of  them. 

We  make  use  here  and  throughout  the  remainder  of  these  notes  of  the  word 
‘maximal’  in  a  set-theoretic  sense.  Thus,  an  (edge)  maximal  forest  is  one  such  that 
the  addition  of  any  edge  in  the  given  graph,  outside  the  forest,  destroys  the  forest 
property  (creates  a  circuit).  The  word  ‘minimal’  is  used  similarly.  Hence,  there 
is  an  important  dilference  between  the  words  ‘maximum’  and  ‘minimum’  and  the 
words  ‘maximal’  and  ‘minimal’.  (See  also  the  definition  of  ‘base’  in  §1.2.) 

A  (connected)  component  of  a  graph  G  is  a  maximal  connected  subgraph. 

Lemma  2.1.1  (a)  Every  (edge)  maximal  forest  of  a  connected  graph  is  a  spanning 
tree,  (b )  Every  spanning  tree  of  a  connected  graph  on  n  vertices  has  exactly  n  —  1 
edges. 

Proof.  Let  G  be  a  connected  graph  on  n  vertices.  The  lemma  is  trivial  if  n  =  1. 
Assume  n  >  1. 

We  first  prove  (a).  Let  T  be  a  maximal  forest  of  G.  If  T  is  not  spanning,  let 
X  be  a  vertex  of  G  not  incident  to  an  edge  of  T.  Since  G  is  connected,  and  has  at 
least  two  vertices,  there  must  be  an  edge  e  incident  to  x.  Adding  e  to  T  obviously 
cannot  create  any  circuits.  This  contradicts  the  maximality  of  T,  and  proves  that 
it  is  spanning.  Now  suppose  that  T  is  not  connected.  Let  T'  be  a  component  of  T, 
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let  X  £  V’(T')  and  let  y  ^  V(T').  Now  since  G  is  connected,  there  is  a  path  from  x 
to  y  in  G.  Let  e  be  the  first  edge  of  this  path  with  exactly  one  end  in  T'.  Clearly 
adding  e  to  T  creates  no  circuit,  again  contradicting  maximality.  This  completes 
the  proof  of  (a). 

Now  consider  (b).  If  every  vertex  of  G  is  incident  to  at  least  two  edges  in  T, 
that  is,  has  degree  at  least  2  in  T,  then  it  is  easy  to  see  that  T  contains  a  circuit 
(see  Exercise  1).  Hence,  there  is  some  vertex  incident  to  exactly  one  edge  of  T.  But 
deleting  this  vertex  and  the  incident  edge,  we  obtain  a  spanning  tree  of  a  graph 
with  one  less  vertex.  The  result  now  follows  by  induction.  [] 

For  a  subset  of  edges  F  of  a  graph  G,  let  G{F)  denote  the  subgraph  of  G  induced 
by  F,  that  is,  the  subgraph  with  edges  F  and  vertices  exactly  those  vertices  incident 
to  some  edge  in  F. 

Corollary  2.1.2  Let  F  be  a  subset  of  edges  of  a  graph  G.  Then  the  number  of 
edges  in  any  maximal  forest  of  G(F)  is  exactly  |V’(G(F’))|  minus  the  number  of 
components  of  G{F).  Hence,  for  the  independence  system  given  by  the  edge-sets  of 
the  forests  of  a  graph,  ri  and  r  are  identical.  It  follows  that  Algorithm  1.2.S  (the 
greedy  algorithm)  solves  the  problem  of  finding  a  maximum-weight  forest.  [] 

This  greedy  algorithm  for  the  MST  problem  is  now  generally  attributed  to  0. 
Boruvka  (1926)  [“On  a  minimal  problem,”  (in  Czech)  Price  Moravske  Pfirodovecke 
Spolecnosti  Brne  3],  although  for  years  the  earliest  reference  was  thought  to  be  J.  B. 
Kruskal  (1956)  [“On  the  shortest  spanning  subtree  of  a  graph  and  the  traveling 
salesman  problem,”  Proceedings  of  the  American  Mathematical  Society  7  48-50]. 

We  postpone  a  discussion  of  the  theoretical  efficiency  of  MST  algorithms  until 
the  end  of  the  next  section. 


2.2  Prim’s  Algorithm 

The  main  algorithm  of  this  section.  Prim’s  algorithm,  is  more  efficient  than  the 
greedy  algorithm  on  “dense  graphs,”  simple  graphs  with  0(|  Vp)  edges.  Our  deriva¬ 
tion  of  this  algorithm  is  based  on  three  simple  graph-theoretic  lemmas.  The  proofs 
are  typical  of  this  subject:  They  are  much  easier  to  picture  than  to  write  down. 


Lemma  2.2.1  Let  C\,C2  be  distinct  circuits  of  a  graph  G,  and  let  e  €  Ci  fl  C^. 
Then  (Ci  U  G2)\{e}  ^  contains  a  circuit. 

^Even  though  ‘path’,  ‘circuit’  and  ‘tree’  have  been  defined  in  §1.1  as  made  up  of  vertices  and 
edges,  it  is  frequently  convenient,  as  it  is  here,  to  consider  them  to  be  simply  subsets  of  edges.  We 
do  this  whenever  convenient,  without  further  comment. 
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Proof.  Cl  and  C2  distinct  implies  there  is  an  edge  /  6  Ci\C2.  Let  /  =  xy. 
Traversing  Ci  starting  at  y  and  moving  away  from  r,  we  must  find  some  first  vertex 
y'  in  common  with  C2  (since  Ci  and  C2  have  a  common  edge).  Let  Py  be  the  path 
so  constructed.  Similarly,  construct  x'  and  Px.  Now  x'  ^  y’,  for  otherwise  Ci  and 
C2  have  at  most  one  vertex  in  common.  Let  Q  be  the  path  in  C2  between  x'  and  y' 
not  including  e.  Then  {/}  U  Px  U  P,,  U  Q  is  the  desired  circuit.  [] 

Lemma  2.2.2  Let  G  be  a  connected  graph,  and  let  T  be  a  spanning  tree  ofG.  Then 
for  any  edge  e  €  E(G)\T,  T  U  {e}  contains  a  unique  circuit,  denoted  C(T,  e),  and 
called  the  fundamental  circuit  of  T  at  e.  For  any  edge  f  G  C(T,e),  (T  U  {e})\{/} 
is  a  spanning  tree. 

Proof.  Let  e  =  xy.  Since  T  is  spanning,  it  includes  x  and  y,  and  since  T  is 
connected,  it  includes  a  path  P  joining  x  and  y.  But  then  P  U  {e}  is  clearly  a 
circuit,  proving  that  T  U  {e}  contains  at  least  one  circuit.  Suppose  that  this  circuit 
is  not  unique.  Then  there  are  two  distinct  circuits  Ci,C2  C  T  U  {e}.  But  then 
by  Lemma  2.2.1,  (Ci  U  C'2)\{e}  contains  a  circuit  contained  in  T,  a  contradiction. 
Hence,  the  circuit  C(T,  e)  =  P  U  {e}  is  unique. 

Now  consider  f  =  uv  ^  C{T,e),  and  let  T'  =  (T  U  {e})\{/}-  If  e  =  /,  then 
evidently  T'  =  T  is  a.  spanning  tree.  Assume  e  ^  /.  Clearly  T'  contains  no  circuit, 
by  the  uniqueness  of  C(T,  e).  Every  vertex  other  than  u  and  v  is  certainly  incident 
to  some  edge  of  T',  since  it  is  incident  to  some  edge  of  T,  and  u  and  v  are  each 
incident  to  some  edge  in  C(T,  e)\{/}.  Finally,  note  that  T'  is  connected  since  every 
path  in  T  between  two  vertices  either  does  not  include  /,  in  which  case  it  is  a  path 
in  T',  or  it  does  include  /,  in  which  case  replacing  /  by  the  path  C{T,  e)\{/}  yields 
a  (not  necessarily  simple)  path  in  T'.  [] 

Definition  2.2.3  Let  G  =  {V,E)  be  a  graph,  and  let  X  Q  V.  Define  6(JC)  to  be 
the  set  of  all  edges  in  G  with  one  end  in  X  and  the  other  in  V\A’.  Subsets  of  edges 
of  the  form  ^(A’)  are  called  cuts.  [] 

Lemma  2.2.4  Let  C*  be  a  cut  and  C  a  circuit  of  some  graph  G  =  (y,E).  Then 
ICTlCI  ^  1. 

Proof.  Suppose  e  G  C*  D  C  and  e  =  xy.  Now  C*  =  b(X)  for  some  X  Q  V,  and  we 
may  assume  x  €  X.  But  then  by  definition  y  G  V’\A.  Consider  the  path  C\{e}. 
Traversing  it  starting  at  x,  there  must  be  some  last  vertex  x'  €  X.  The  next  edge 
in  this  path  then  has  one  end  in  X  and  one  end  in  V\X,  and  is  the  required  second 
edge  in  C*  DC.  [] 
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The  next  result  is  the  key  to  the  spanning  tree  problem. 

Theorem  2.2.5  Let  G  =  (V,  E)  be  a  connected  graph  with  edge  weighting  w,  and 
let  F  he  the  edge-set  of  a  forest  in  G.  Let  Vi, . . . ,  V*  be  a  list  of  the  vertex-sets  of  the 
components  of  the  edge-induced  subgraph  G{F).  Suppose  for  some  j  (1  <  j  <  k), 
that  e  is  a  minimum-weight  edge  in  S{Vj).  Then  among  all  spanning  trees  that  are 
minimum-weight  extensions  of  F,  there  is  one  containing  e. 

Remark:  (a)  We  have  previously  proved  that  all  maximal  forests  of  a  connected 
graph  are  spanning  trees.  Thus,  F  is  contained  in  some  spanning  tree. 

(b)  The  above  result  applies  to  any  forest  F.  F  need  not  be  contained  in  any 
minimum  spanning  tree  of  G.  One  can  imagine  applications  in  which,  because  of 
certain  “external  constraints,”  some  edges  are  required  to  be  in  the  solution,  even 
though  they  are  in  no  globally-minimum  tree.  [] 

Proof  of  Theorem  2.2.5.  Given  Lemmas  2.2.2  and  2.2.4,  the  proof  is  straightfor¬ 
ward.  Let  T  be  a  minimum-weight  extension  of  F.  If  e  €  T,  we  are  done.  Suppose 
not.  Let  C*  =  S(Vj),  and  let  C  =  C{T,e)  be  the  fundamental  circuit  of  T  at  e. 
Then  e  €  C*  H  C,  and  so  by  Lemma  2.2.4,  there  is  a  second  edge  f  E  C*  f)  C.  Let 
T'  =  (T  U  {e})\{/}.  Then  T'  is  a  spanning  tree  by  Lemma  2.2.2,  and 

w{T')  =  w(T)  -f-  w(e)  —  w{f)  < 

by  the  choice  of  e.  This  completes  the  proof.  [] 

The  validity  of  Prim’s  algorithm,  given  below,  is  immediate  from  Theorem  2.2.5. 
This  motivates,  in  part,  the  statement  of  the  theorem.  However,  this  result  can  also 
be  used  to  prove  the  validity  of  the  greedy  algorithm.  Algorithm  1.2.3.  In  this 
context  it  is  most  natural  to  consider  an  alternate,  minimization  version  of  the 
greedy  algorithm  in  which  one  does  not  stop  with  a  forest,  but  continues  through 
all  edges  until  a  minimum  spanning  tree  is  constructed.  The  proof  that  the  resulting 
tree  is  indeed  minimum  proceeds  by  induction.  Clearly  the  empty  forest  is  contained 
in  a  minimum  tree.  Assume,  as  the  inductive  step,  that  the  forest  constructed  up 
to  some  point  in  the  algorithm  is  contained  in  some  minimum  tree.  We  need  only 
prove  that  this  assumption  is  preserved  when  an  edge  is  added  to  the  forest.  But 
when  an  edge  is  added  it  is,  by  the  ordering  (now  from  minimum-  to  maximum- 
weight  edge),  the  minimum- weight  edge  then  available  that  does  not  create  a  circuit 
together  with  previously  chosen  edges,  that  is,  it  is  a  minimum-weight  edge  with 
ends  in  different  components  of  the  current  forest.  Hence,  by  Theorem  2.2.5,  the 
inductive  assumption  is  preserved. 
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Algorithm  2.2.6  Prim’s  Algorithm 

Input:  A  connected  graph  G  =  (V,  E)  with  edge  weighting  w. 

Output:  A  minimum  spanning  tree  T  of  G. 
begin 

X  :=  {u}  for  some  v  ^V\ 

T:=  0; 

while  X  do  begin 

find  e  G  S(X)  such  that  w(e)  =  minygg(;f) 

r:=ru{c}; 

X  :=  X  U  {j/}  where  e  =  xy  and  x  £  X; 

end 

end 

Theorem  2.2.7  Algorithm  2.2.6  ia  correct.  [] 

We  conclude  this  section  with  a  discussion  of  the  computational  complexity  of 
Prim’s  algorithm.  We  do  not  include  full  details.  These  can  for  the  most  part  be 
found  among  the  references  given  in  the  bibliography.  For  this  discussion  we  assume 
that  G  =  (V,  E)  is  a  connected  graph,  n  =  IVI  and  m  =  \E\.  We  also  assume  that 
E  is  stored  so  that  the  ends  of  any  edge  can  be  found  in  time  0(1). 

0(mn):  It  is  trivial  to  give  an  0(mn)  implementation.  To  store  X,  simply  keep  a  bit  representation, 
that  is,  keep  an  array  COMP,  say,  of  length  n  in  which  the  entry  for  a  vertex  is  1  if  that  vertex  is 
in  X,  and  0  otherwise.  On  each  execution  of  the  while  loop,  we  find  e  by  scanning  all  edges,  and 
checking  the  location  of  its  ends  using  COMP.  The  scan  takes  time  0{m).  Updating  COMP  emd  T  is 
0(1).  Since  the  while  loop  is  executed  n  —  1  times,  the  overall  bound  is  clearly  0(mr»),  or  O(n^) 
for  dense  graphs. 

O(n^):  A  shortcoming  of  the  above  implementation  is  that  too  much  time  is  spent  finding  the  edge 
e  in  the  while  loop.  This  situation  may  be  improved  by  keeping  appropriate  information  for  the 
vertices  in  U\X.  Create  arrays  SMALL  and  SMALL-EDGE,  each  of  length  n,  eind  initialize  them  as 
follows.  For  each  u  G  U\{t)},  SMALLCu]  =  u;(w«)  if  vu  exists,  and  =  +oo  otherwise;  SMALL.iDGE [u] 
is  a  pointer  to  v  in  the  first  case,  and  null  otherwise.  This  initialization  takes  time  0(n).  Now  to 
find  e  we  simply  scan  SMALL,  and  take  the  minimum  value  that  arises.  If  SMALLCu]  is  this  minimum, 
then  SMALL-EOGECu]  is  used  to  update  T.  This  ^dl  takes  time  0(n),  as  does  updating  these  two 
structures. 

O(mlogn)  This  bound  is  not  as  good  as  the  above  one  for  dense  graph.  However,  in  some 
practical  problems  m  is  roughly  linear  in  n,  and  then  the  bound  is  better.  We  give  two  approaches 
that  achieve  this  bound.  The  first  is  more  direct. 

(a)  We  use  basically  the  same  approach  used  to  obtain  the  O(n^)  bound,  but  now  we  keep  a 
(simple)  heap  (see  [1])  in  order  to  quickly  find  the  minimum  element  in  SMALL.  The  computa^ 
tion  of  the  minimum  is  then  0(1).  Extra  work  is  however  required  to  update  the  heap.  The 
initial  heap  is  constructed  in  time  0(n) — that  this  is  possible  is  a  standard  result  for  heaps. 
The  updates  proceed  as  follows.  When  a  new  vertex  y  is  added  to  X,  as  many  as  d(y)  (the 


^logn  stands  for  log2  n. 
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degree  of  y)  values  in  SMALL  could  change.  Updating  the  heap  for  each  of  these  requires  time 
O(logn),  and  so  the  total  work  to  add  y  to  X  is  0(d(t;)logn).  Since  J2v€V  =  2m,  the 
O(mlogn)  bound  follows. 

(b)  This  bound  is  derived  in  [9].  It  involves  a  substantial  modification  of  the  algorithm  as  stated, 
one  that  more  fully  makes  use  of  the  generality  of  Theorem  2.2.5.  We  keep,  instead  of  a  single 
vertex  set  X,  a  collection  of  disjoint  vertex  sets  Xi, . .  .,Xt,  for  some  k.  Initially  k  =  n  so 
that  each  Xj  is  a  single  vertex.  At  a  general  iteration  we  find,  for  each  Xj,  a  minimum-weight 
edge  in  S(Xj).  This  can  be  done  in  a  relatively  obvious  way  in  one  petss  through  Let  A 
be  the  list  of  edges  so  generated.  Clearly  |j4|  >  k/2,  since  every  Xj  is  incident  to  at  least  one 
member  of  A,  and  each  edge  of  A  is  incident  to  at  most  two  Xj.  Now  by  repeated  application 
of  Theorem  2.2.5,  we  may  add  all  the  edges  of  A  to  T.  The  Xj  must  then  be  updated.  This 
takes  time  0(n),  and  thus  the  whole  iteration  takes  time  0(m).  Now,  since  |A|  >  k/2,  k  is 
reduced  by  at  least  half  at  the  next  iteration.  Thus,  the  number  of  iterations  is  bounded  by 
logn,  and  the  overall  bound  follows. 

0(m  +  nlogn):  This  bound  is  due  to  Fredman  and  Tarjan.  It  is  achieved  using  of  Fibonacci  heaps, 
a  discussion  of  which  is  beyond  the  scope  of  these  notes  [M.  L.  Fredman  and  R.  E.  Tarjan  (1984), 
“Fibonacci  heaps  and  their  uses  in  improved  network  optimization  algorithms,”  Proceedings  of  the 
25th  Annual  IEEE  Symposium  on  Foundations  of  Computer  Science  338-346].  They  also  obtain  a 
bound  of  the  form  0(m/3(m,  n)),  where 

j3(m,  n)  =  min{j  :  log^'’^  n  <  m/n}, 

and  log^''^  n  denotes  the  j**  iterated  logarithm  of  n. 

A  difficulty  with  these  bounds  is  that  Fibonacci  heaps  are  difficult  to  implement,  emd  in  practice 
do  not  seem  to  be  as  eflScient  as  other  theoretically  less  efficient  kinds  of  heaps.  Successive  attempts 
to  remedy  this  situation  are  made  in  [D.D.  Sleator  and  R.E.  Tarjan  (1983),  “Self-adjusting  binary 
trees,”  Proceedings  of  the  15th  Annual  ACM  Symposium  on  Theory  of  Computing,  235-245;  D.  D. 
Sleator  and  R.  E.  Tarjan,  “Self-adjusting  heaps,”  SIAM  Journal  on  Computing,  to  appear;  R.  E. 
Tarjan,  D.  D.  Sleator,  M.  L.  Fredman  auid  R.  Sedgewick,  “The  peiiring  heap;  A  new  form  of  self- 
adjusting  heap,”  AT&T  Bell  Laboratories  Technicad  Report,  September  18,  1985].  However,  the 
corresponding  theoretical  bounds  have  not  yet  been  obtained  for  these  modifications. 


Exercises 


2.1  For  sets  X  and  Y  define 

XAY  =  (x\y)  u  (r\x). 

XAY  is  called  the  symmetric  difference  of  X  and  Y.  Let  C  and  C"  be  two  circuits  of  a 
graph  G. 

(a)  Let  H  he  a.  subgraph  of  G  in  which  every  vertex  has  even  degree.  Show  that  if  H  has 
at  least  one  edge,  then  it  contains  a  circuit. 

(b)  Show  that  there  are  edge-disjoint  circuits  (»ti  >  1),  such  that  C'AC"  = 

Cl  U  . . .  U  Cm-  (This  result  strengthens  Lemma  2.2.1.) 
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2.2  Prove  the  validity  of  the  following  algorithm: 

Algorithm.  Dual  Greedy  Algorithm. 

Input:  A  connected  graph  G  =  (F,  E)  with  edge  weighting  w. 

Output:  A  minimum  spanning  tree  T  of  G. 
begin 

Sort  E  so  that  w(ei)  >  ...>  ti;(c|£;|); 

T:=  E; 

for  j  :=  1  until  |£^|  do 

ifT\{e,}  is  connected  then  T  :=  T'\{ej}; 

end 

2.3  Let  G  be  a  simple  graph  with  n  =  |F|  >  3.  A  Hamiltonian  circuit  or  tour  in  G  is  a 
circuit  with  n  vertices.  G  is  Hamiltonian  if  is  has  a  tour.  It  is  easy  to  give  examples  of 
graphs  that  are  not  Hamiltonian,  but  one  would  expect  that,  if  a  graph  has  enough  edges, 
then  it  is  very  likely  Hamiltonian.  Use  the  following  steps  to  prove  that,  if  every  vertex  of 
G  has  degree  at  least  n/2,  then  G  is  indeed  Hamiltonian. 

(a)  Let  P  =  (no?  •  •  •  ?  v*;)  be  a  simple  path  in  G,  and  assume  that  neither  no  i^or  n^  is 
joined  by  an  edge  to  a  vertex  not  in  P  (thus,  P  cannot  be  “extended”  to  a  longer 
path).  Prove,  using  a  counting  argument  and  the  fact  that  d(no)  +  d(vjk)  >  n,  that  for 
some  j,  nonj+i  and  VjVk  are  edges  of  G  (draw  a  picture!).  Note  that  deleting  njU^+i 
and  adding  these  two  edges  to  P  yields  a  circuit. 

(b)  Let  C  be  a  circuit  with  fewer  than  n  vertices.  Use  a  counting  argument,  using  the 
fact  that  either  C  or  its  complement  has  at  most  7i/2  vertices,  to  show  that  some 
vertex  in  G  is  joined  to  a  vertex  not  in  C.  Thus,  by  deleting  an  appropriate  edge  of 
C  we  may  “extend”  C  to  a  longer  path. 

Can  this  proof  be  made  into  an  algorithm?  If  so,  can  you  estimate  its  complexity? 

2.4  Show  that  the  symmetric  difference  of  two  cuts  is  a  disjoint  union  of  cuts. 
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Chapter  3 
Shortest  Paths 


3.1  Introduction 

The  shortest  path  problem  is  one  of  the  most-studied  problems  in  combinatorial 
optimization,  and  could  easily  be  the  subject  of  several  chapters.  Among  its  varia¬ 
tions  are  the  fc-shortest  path  problem,  the  longest  path,  most-dependable  path  and 
maximum  capacity  path  problems,  and  the  problem  of  finding  shortest  paths  with 
an  even  or  odd  number  of  edges.  Our  treatment  will,  however,  be  rather  brief.  We 
consider  only  the  problem  of  finding  shortest  paths  from  a  single  given  vertex  to  all 
other  vertices.  For  this  version  we  examine  a  lineax-programming  (LP)  ^  approach 
(the  Ford  algorithm),  the  Moore-Bellman  algorithm,  and  Dijkstra’s  algorithm. 

Those  wishing  a  more  detailed  treatment  are  referred  to  [6].  For  additional 
material  see  also: 

•  Domschke,  K.  (1972),  “Kiirzeste  Wege  in  Graphen,”  Mathematical  Systems 
in  Economics  2,  Verlag  A.  Hain,  Meisenheim  am  Gian 

•  Dreyfus,  S.  E.  (1969),  “An  appraisal  of  some  shortest-path  algorithms,”  Op¬ 
erations  Research  17  395-412  (a  classic  article) 

•  Glover,  F.,  D.  Klingman  and  Phillips  (1985),  “A  new  polynomially  bounded 
shortest  path  algorithm,”  Operations  Research  33  65-73  (this  paper  discusses 
the  more  recent  literature) 

•  Syslo,  M.  M.,  N.  Deo  and  J.  S.  Kowalik  (1983),  the  bibliography  of  Discrete 
Optimization  Algorithms,  with  Pascal  Programs,  Prentice-Hall,  Englewood 
Cliffs,  New  Jersey 

very  brief  introduction  to  linear  programming  is  given  in  §4.1 
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3.2  Definitions 


It  is  customaxy  to  study  shortest-paths  in  the  context  of  directed  graphs.  We  begin 
with  a  short  introduction  to  these. 

A  directed  graph,  or  digraph,  D  =  (V,  A)  is  an  ordered  pair  made  up  of  a  finite 
set  V  of  vertices  and  a  finite  set  A  of  arcs,  such  that  each  axe  haa  associated  with 
it  an  ordered  pair  of  (not  necessarily  distinct)  vertices  called  its  tail  and  its  head, 
respectively.  For  an  arc  e,  we  denote  the  tail  by  t(e)  and  the  head  by  h(e).  t(e) 
is  then  a  predecessor  of  h{e)  and  h(e)  a  successor  of  t(e).  When  no  confusion  can 
axise  (for  example,  when  there  axe  no  similarly  directed  parallel  axes)  we  write 
e  =  (t(e),/i(e)). 

All  the  notions  previously  introduced  for  (undirected)  graphs,  such  as  tree,  path 
and  circuit,  can  be  applied  to  digraphs  by  considering  the  underlying  graph,  simply 
ignoring  the  directions  on  the  arcs.  For  some  of  these  notions  we  also  define  corre¬ 
sponding  directed  versions.  Thus,  a  vo-*Vk  dipath  P  =  (vq,  ...,Vk)  is  &  path  such 
that  (u,_i,uj)  is  and  arc  (i  =  0, ...,fc).  Dicircuit  is  defined  similarly.  For  X  CV, 
^  X,  h(e)  €  V’\A’}  and  ^'^(A’)  =  ^“(V’\A)  (see  Definition  2.2.3), 

Definition  3.2,1  Let  D  =  (V,A)  be  a  digraph  with  a  distinguished  vertex  r, 
called  the  root,  and  a  weight  or  length  ^  function  w  defined  on  A.  For  a  dipath 
P  =  {vo,...,Vk),  the  length  of  P,  w(P),  is  defined  by  w{P)  = 

The  shortest  path  problem,  SPP,  for  {D,w,r)  is  to  find,  for  each  vertex  u  of  D,  a 
minimum-length,  or  shortest,  dipath  from  r  to  v. 

It  may  seem  that  a  more  natural  version  of  the  SPP  would  be  one  in  which  a 
single  dipath  between  a  pair  of  specified  vertices  is  the  goal.  Indeed,  this  form  of  the 
problem  is  discussed  in  some  of  what  follows.  However,  most  of  the  practical  algo¬ 
rithms  that  solve  this  “more  natural”  problem,  also  solve  the  one  in  the  definition, 
and  so  this  version  will  be  the  focus  of  our  treatment. 

Finally,  one  can  also  define  the  shortest  path  problem  in  an  obvious  way  for 
undirected  graphs.  Instances  of  this  problem  can  be  converted  to  a  directed  prob¬ 
lem  by  replacing  each  edge  by  two  oppositely  directed  axes,  both  having  the  same 
length  as  the  original  edge.  If  the  length  on  the  edge  is  nonnegative,  then  the 
techniques  given  in  this  chapter  axe  applicable  to  the  resulting  directed  problem  in 
a  straightforward  way.  However,  if  there  axe  negative-length  edges,  then  a  more 
sophisticated  approach  using  “nonbipaxtite  matching”  is  called  for.  This  approach 
is  discussed  in  [6]. 

^The  reader  is  cautioned  that  these  “lengths”  need  not  be  actual  physical  lengths,  and  so  may 
be  negative.  For  this  re£ison  the  use  of  the  word  ‘weight’  here  might  have  some  advantages,  but  we 
prefer  the  more  natural  ‘length’. 
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3.3  Minty’s  Analog  Algorithm  and  an  LP  Ap¬ 
proach 

George  Minty  suggested  the  following  analog  approach.  Let  G  —  (V,  E)  be  a  graph 
with  two  distinguished  vertices  r  and  s  and  with  a  nonnegative  length  function  w 
defined  on  the  edges.  Construct  a  “string  model”  of  G  in  which  the  edges  have 
actual  lengths  proportional  to  their  length  as  defined  by  w.  Now,  grasp  this  model 
by  its  two  “ends,”  that  is,  by  the  two  vertices  r  and  s,  and  pull.  One  would  expect 
that  the  length  of  a  shortest  path  between  r  and  a  will  then  be  the  distance  by  which 
they  axe  separated  after  this  pulling.  In  fact,  some  additional  reflection  reveals  that 
this  expectation  is  not  quite  justified,  since  some  of  the  strings  may  be  “caught”  in 
loops  with  others,  but  that  some  retying  of  the  strings  will  remove  these  loops  and 
give  the  desired  result. 

The  relevance  of  Minty’s  method  here  is:  It  suggests  that  we  can  solve  the  SPP, 
which  is  a  minimization  problem,  in  a  natural  way  as  a  maximization  problem.  The 
formulation  below  uses  this  dual  approach. 

Let  D  =  (V,  A)  be  a  digraph  with  length  function  w.  Let  r  and  s  be  two 
distinguished  vertices  of  D.  Consider  the  following  linear  program: 

max  u(s)  -  u(r) 

s.t.  u(h(e))  —  u(t(e))  <  w(e)  (e  €  A)  ^  ‘  ' 

To  help  understand  (3.1),  suppose  that  there  exists  an  r—^v  dipath  for  each 
V  €  V,  and  assume  that  D  contains  no  negative  (-length)  dicircuit.  Let  u(v)  be  the 
length  of  a  shortest  r—*s  dipath  for  each  v  .  Note  that  we  may  assume  any  such 
dipath  is  simple,  since  otherwise  it  contains  a  dicircuit,  and  any  dicircuit  can  be 
deleted  because,  by  assumption,  it  has  nonnegative  length.  Now  cleaxly  u(r)  =  0, 
and  so  u(s)  —  u{r)  is  evidently  the  length  of  a  shortest  j — >s  dipath.  Let  e  G  A,  and 
let  P  be  a  shortest  r—*t{e)  dipath.  Appending  e  to  P  we  obtain  an  r—^h{e)  dipath 
P'  (not  necessarily  simple);  moreover, 

u{h{e))  <  w{P')  =  w{P)  -|-  w{e)  =  u{t{e))  -f  w(e). 

That  is,  u(h(e))  —  u(t(e))  <  w(e).  Hence,  if  Z>  contains  no  negative  dicircuit,  then 
(3.1)  is  feasible. 

Theorem  3.3.1  If  (S.l)  has  an  optimal  solution  u*,  then  «*(s)  — u*(r)  is  the  length 
of  a  shortest  r—^'S  dipath. 


Proof.  Let  P  —  (uq,  . . .  ,Ufc)  be  a  vo—>Vk  dipath.  Then  we  have 
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u*(vk)  -  u*(vo)  =  ^(u*(vj)  -  U*(Vj-i)) 

J=1 

k 

j=i 

=  w(P). 

It  follows  that  for  any  vertices  x  and  y,  and  any  x-^y  dipath  P,  u*(y)  —  u*(x)  is 
a  lower  bound  on  w(P).  Note  also  that  if  u*  is  tight  for  each  arc  in  P,  that  is,  if 
u*{h(e))  —  u*(t(e))  —  w(e)  for  each  arc  e  of  P,  then  u*(y)  —  u*(x)  =  w{P). 

Now  let  X  be  the  set  of  all  vertices  reachable  from  r  by  u*-tight  dipaths.  Clearly 
r  E  X,  and  if  s  €  X,  then  it  follows  by  the  computations  of  the  previous  paragraph 
that  u*(3)  —  u*(r)  is  the  length  of  a  shortest  r—*3  dipath,  as  required.  Assume 
s^X. 

Let  a  =  min{it;(e)  -  [u*(/i(e))  -  u*(t(e))]  :  e  €  ^■(X)}.  Then  a  >  0,  by  the 
choice  of  X.  For  each  v  E  V^\X,  define  u'(x)  :=  u*(x)  +  or,  and  for  x  E  X  define 
u'{x)  :=  u*(x).  The  constraints  of  (3.1)  are  then  affected  only  for  arcs  e  with 
at  least  one  end  in  V\X.  By  the  choice  of  a,  they  are  still  valid  on  6~(X),  and 
they  axe  obviously  unaffected  for  any  arcs  with  both  ends  in  V’\X.  It  remains 
only  to  consider  arcs  e  €  ^''■(X).  But  for  any  such  arc  we  have  u'{h(e))  —  u'(t{e))  = 
u*{h(e))  —  u*(t(e))  —  a  <  w{e).  Since  u'(s)  —  «'(r)  =  u*(s)  — u*(r)  +  Q!  >  u*(s)  — u*(r), 
this  contradicts  the  optimality  of  u*.  [] 

The  question  of  existence  of  an  optimal  solution  for  (3.1)  is  answered  by  the 
following  theorem. 

Theorem  3.3.2  LP  (3.1)  is  feasible  iff  D  has  no  negative  dicircuits.  If  (3.1)  is 
feasible,  it  is  bounded  iff  there  exists  at  least  one  r—*s  dipath. 

Proof.  In  the  proof  of  Theorem  3.3.1  it  is  shown  that  the  length  of  any  r—*s 
dipath  is  a  bound  on  u{s)  —  u{r),  for  any  feasible  solution  u  of  (3.1).  Conversely, 
suppose  that  there  exists  no  r— >s  dipath.  Let  X  be  the  set  of  all  vertices  reachable 
by  dipaths  from  r.  Then  we  have  s  ^  X.  Now  given  any  feasible  solution  u  of 
(3.1),  we  can  clearly  add  an  arbitrary  constant  to  u(i)  for  each  x  E  VXX,  not 
affecting  feasibility,  and  increasing  the  value  of  the  objective  u(s)  —  u(r)  without 
bound.  Thus,  (3.1)  is  unbounded.  This  proves  the  boundedness  criterion  stated  in 
the  theorem. 

Now  consider  the  feasibility  question.  Suppose  C  =  (uoi  •  •  ■  >  Ujt  =  uo)  is  a  nega¬ 
tive  dicircuit.  Then  the  constraints  of  (3.1)  imply 
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0  =  u(ujfc)  -  ti(uo) 

= 

i=i 

k 

i=i 

=  wiO 

<  0, 

a  contradiction.  It  follows  that  the  existence  of  negative  dicircnit  precludes  the 
existence  of  a  feasible  solution  for  (3.1). 

Finally,  consider  the  converse.  Suppose  that  D  has  no  negative  dicircuit,  and  let 
X  be  the  set  of  all  vertices  x  such  that  an  r^x  dipath  exists.  Let  u(x)  be  the  length 
of  a  shortest  r—*x  dipath  for  each  x  ^  X.  By  the  discussion  following  (3.1),  u(x) 
is  well  defined  and  satisfies  the  constraints  of  (3.1)  on  If  X  =  V,  we  are  done; 
otherwise,  pick  r'  6  V’\X,  and  repeat  the  above  construction:  Let  X'  be  the  subset 
of  vertices  reachable  from  r',  and  define  u'  appropriately  on  X'.  The  constraints 
of  (3.1)  are  then  clearly  satisfied  on  X'  by  u'.  We  would  now  like  to  extend  u'  to 
X  U  X',  but  there  is  a  difficulty:  Even  though  there  are,  by  definition,  no  arcs  from 
X'  to  X\X',  there  may  be  arcs  from  X\X'  to  X'.  This  situation  is  remedied  by 
choosing  an  appropriately  large  constant  and  adding  it  to  u  on  X\X'.  Now  setting 
u'  u  on  X\X',  we  obtain  a  feasible  solution  on  X  U  X'.  Repeating  this  entire 
construction  until  a  solution  is  found  for  all  of  V  completes  the  proof.  [] 

The  above  feasibility  condition,  that  negative  dicircuits  do  not  exist,  is  funda¬ 
mental.  The  SPP,  though  generally  considered  well  solved,  is  in  fact  jVV-haxd  for 
digraphs  with  negative  dicircuits:  A  solution  to  this  problem  is  easily  seen  to  imply 
a  solution  of  the  TSP. 


3.4  Solving  (3.1) 

Obviously,  (3.1)  can  be  solved  using  the  simplex  method,  or  any  other  method  for 
solving  LPs.  However,  the  problem  is  simpler  than  that.  L.  R.  Ford,  Jr.,  offered 
the  following  direct  method. 

Algorithm  3.4.1  Ford’s  Algorithm. 

Input:  An  SPP  (D,r,w)  with  no  negative  dicircnits. 

Output:  An  optimal  solution  u  of  (3.1). 
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Comment:  For  any  x,  if  no  r—*x  dipath  exists,  the  algorithm  terminates  with  u(x)  = 

+00.  This  is  consistent  with  Theorem  3.3.2. 

begin 

u{r)  :=  0; 

for  X  G  do  u(x)  ;=  +oo; 

while  u{h(e))  —  n(t(e))  >  w(e)  for  some  e  £  A  do 
u{h{e))  :=  u(t(e))  +  w(e); 

end 

As  is  noted  again  below,  Ford’s  algorithm  is  not  polynomial  (and  is  not  even 
finite  if  we  allow  negative  dicircuits).  However,  it  is  instructive,  and  at  least  conve¬ 
nient  for  hand  computations.  Note  also  that  the  algorithm  does  not  actually  output 
the  shortest  paths,  when  they  exist,  but  simply  their  lengths.  However,  it  is  not 
hard  to  see  that  these  can  be  found  by  appropriately  recording  the  arcs  e  for  which 
u(h(e))  changes  in  the  while. 

Theorem  3.4.2  Ford’s  algorithm  is  finite;  moreover,  if  for  a  particular  x  G  X, 
u(x)  <  -|-oo  at  termination,  then  u(x)  is  the  length  of  a  shortest  r-^x  dipath,  and 
if  u{x)  =  -|-oo,  then  there  is  no  r—^x  dipath. 

Proof.  We  give  only  an  outline.  The  key  is  to  prove  that  whenever  u(x)  is  finite  for 
a  particular  vertex  x,  then  u{x)  is  the  length  of  some  simple  r—*x  dipath.  This  is 
certainly  true  at  the  start  of  the  algorithm.  One  needs  only  prove  that  the  property 
is  maintained  by  the  assignment  statement  in  the  while. 

Now,  given  the  above  fact  about  u,  to  see  that  the  algorithm  is  finite,  observe 
that  each  update  strictly  decreases  some  u(x).  But  there  are  only  a  finite  number 
of  simple  r—*x  dipaths,  and  hence  only  a  finite  number  of  possible  values  for  it(x). 

Finally,  the  fact  that  the  u{x)  at  termination  have  the  correct  values  follows 
because,  first,  when  finite  they  do  correspond  to  the  length  of  some  dipath,  and, 
second,  the  constraints  of  (3.1),  which  axe  evidently  satisfied  at  termination,  imply 
that  u(x)  is  a  lower  boimd  for  the  length  of  any  r— vx  dipath.  [] 

Ford’s  algorithm  has  two  major  shortcomings:  It  can  only  be  applied  to  digraphs 
that  axe  known  not  to  contain  negative  dicircuits,  and  it  has  exponential  worst-case 
behavior.  We  have  not  explicitly  demonstrated  this  second  claim,  but  it  is  not  hard 
to  do  so.  That  this  is  possible  should,  in  any  case,  not  be  surprising  since  the  order 
in  which  updates  occur  in  the  algorithm  is  completely  arbitrary.  The  bound  given 
in  the  proof  is  certainly  bad. 

Both  the  above  difficulties  axe  dealt  with  by  the  Moore-Bellman  algorithm.  This 
algorithm  works  roughly  as  follows.  To  start,  an  ordering  of  the  vertices  is  fixed. 
Then  repeated  passes  axe  made  through  the  vertices,  using  this  ordering,  |y|  passes 
in  total.  Each  time  a  vertex  is  encountered  during  a  pass,  an  “update”  is  performed 
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on  all  arcs  with  tail  equal  to  that  vertex.  Thus,  on  each  pass  each  arc  e  is  examined 
exactly  once.  The  total  work  for  this  procedure  is  clearly  polynomial 

Algorithm  3.4.3  Moore-Bellman  Algorithm 

Input:  A  digraph  D  =  (F, A)  with  arc  lengths  w.  We  assume  V  =  {l,...,n}  and 
r  =  1. 

Output:  The  conclusion  that  D  has  a  negative  dicircuit,  or  vertex  numbers  u”(l), 

. .  .,u"(n)  with  the  following  interpretation:  ff  «”(;)  <  +oo,  then  «”(;)  is  the  length 
of  a  shortest  dipath;  otherwise,  if  =  +oo,  there  is  no  l^j  dipath.  In 

addition,  if  there  is  no  negative  dicircuit,  then  for  each  <  +oo,  j  1,  a  vertex 
PRED[j]  is  output  such  that  (PRED[j]  ,j)  is  the  last  arc  in  some  shortest  1-^j  dipath. 

begin 

«°(1)  :=  0; 

for  j  :=  2, . . . ,  n  do  u°(i)  :=  +oo; 
for  t  :=  1, . . . ,  n  do  begin 

for  j  :=  1, . .  .,n  do  u‘(j)  := 
for  j  :=  1, . . . ,  n  do  begin 
for  (j,  A:)  6  A  do  begin 

if  u'{k)  >  +  'w(j,k)  then  begin 

u'{k)  :=  u'-^(j)  +  w(j,k); 

PREDCfc]  :=j; 
end 
end 
end 
end 

if  u"(j)  <  for  some  j  then 

print  “There  exists  a  negative  dicircuit” ; 

end 

Theorem  3.4.4  The  Moore-Bellman  shortest  path  algorithm  is  correct  and  runs 
in  time  0(|F||f^|). 

Proof.  Again,  we  give  only  an  outline,  and,  again,  the  key  to  the  proof  is  an 
appropriate  interpretation  of  «.  In  this  case  one  can  show  that  when  u'[j)  <  +oo 
for  some  vertex  j  and  some  t  =  0, . . . ,  n,  then  u'{j)  is  the  length  of  a  shortest  1—^j 
dipath  using  at  most  i  arcs.  Certainly  this  is  true  at  the  start  of  the  algorithm, 
after  is  initialized,  and  it  is  straightforward  to  verify  the  claim  in  general,  by 
induction  on  i.  Now,  note  that  no  simple  dipath  can  have  more  than  n  —  1  arcs, 
since  D  has  only  n  vertices.  Hence,  if  u”(i)  <  for  some  j,  then  by  the  above 

claimed  interpretation  of  must  be  realized  by  a  nonsimple  dipath,  and 

this  dipath  must  contain  a  negative  dicircuit.  The  remaining  parts  of  the  proof  are 
routine.  fl 
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Note  that  Algorithm  3.4.3  is  not  guaranteed  to  find  a  negative  dicircuit  whenever 
one  exits,  only  to  find  one  if  it  stands  in  the  way  of  finding  all  shortest  dipaths 
starting  at  the  root.  In  particular,  if  there  is  a  negative  dicircuit,  but  no  vertex  on 
it  is  reachable  from  the  root,  then  the  algorithm  will  not  find  this  dicircuit.  Note 
also  that  the  implementation  suggested  above  admits  some  obvious  simplifications. 
For  notational  convenience  we  have  written  the  u  values  in  such  a  way  that  a  total 
storage  requirement  of  O(n^)  is  suggested:  . . . ,  u”(n).  However,  the  algorithm 

never  requires  knowledge  of  more  the  0(n)  of  these  terms:  and  u\j)  for 

j  =  1, . . . ,  n  and  the  current  i.  A  second  simplification,  or  at  least  speedup,  can  be 
obtained  by  better  exploiting  the  information  in  the  algorithm.  In  particular,  while 
the  algorithm  uses  in  the  updates  for  «*,  for  a  given  pass  i,  it  could  clearly 
only  help  to  use  the  current  values  of  u*  where  these  are  smaller  than  u'~^.  This 
idea  leads  to  a  theoretically  faster  algorithm  [Yen  (1970),  “An  algorithm  for  finding 
shortest  routes  from  all  source  nodes  to  a  given  destination  in  general  network,” 
Quarterly  Journal  of  Applied  Mathematics  27  526-530;  see  also  [6],  pages  76  and 
77],  but  does  have  the  disadvantage  that  the  proof  is  somewhat  more  involved:  The 
interpretation  of  u'(j)  given  in  the  proof  is  no  longer  quite  correct. 


3.5  Some  Miscellaneous  Results 

We  describe  here  one  result  in  some  detail,  Dijkstra’s  algorithm,  and  mention  two 
others:  the  SPP  for  acyclic  digraphs,  and  the  all-pairs  problem. 

Dijkstra’s  algorithm  [Dijkstra  (1959),  “A  note  on  two  problems  in  connexion 
with  graphs,”  Numerische  Mathematik  1  269-271]  seems  to  have  been  frequently 
discovered,  and  is  certainly  frequently  used,  especially  in  theoretical  applications. 
The  reader  will  almost  certednly  encounter  it,  and  so  it  seems  wise  to  present  it  here. 
The  situation  treated  is  that  in  which  all  arc  lengths  are  nonnegative.  Thus,  the 
problem  of  negative  dicircuits  is  automatically  settled.  The  procedure  itself  can  be 
viewed  as  a  “label  fixing”  procedure,  rather  than  a  “label  adjusting”  procedure  as  in 
the  case  of  the  Ford  and  Moore-Bellman  algorithms.  The  reason  for  this  designation 
can  be  seen  in  the  following  description  of  the  algorithm. 

The  algorithm  begins  by  assigning  the  root  the  label  0  and  designating  this  label 
as  “fixed;”  0  is  clearly  the  length  of  the  shortest  path  from  the  root  to  itself.  All 
other  vertices  receive  as  “temporary”  labels  the  length  of  the  arc  from  the  root 
to  that  vertex,  if  there  is  one,  and  otherwise  the  temporary  label  -f.  Among  the 
vertices  with  temporary  labels  the  vertex  x  with  the  smsillest  label  is  then  selected, 
and  is  designated  as  fixed.  This  label  must  be  the  length  of  a  shortest  path  by  virtue 
of  the  nonnegativity  of  the  arc  lengths.  Now,  for  each  vertex  y  with  a  temporary 
label  that  is  also  a  successor  of  x,  we  compare  the  temporary  label  with  the  label 
of  X  plus  w{x,  y),  and  update  if  necessary.  Then  we  fix  the  label  of  the  vertex  with 
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the  smallest  temporary  label,  and  so  on. 

Algorithm  3.5.1  Dijkstra’s  Algorithm. 

Input:  A  SPP  {D,  r,  w).  Assume  w  >0. 

Output:  Vertex  numbers  u{x)  (x  G  V)  with  the  following  interpretation:  If  u(x)  < 
+00,  then  u(x)  is  the  length  of  a  shortest  r-*x  dipath;  otherwise,  if  u{x)  =  +oo, 
there  is  no  r—*x  dipath.  In  addition,  for  each  u(x)  <  +oo,  x  ^  r,  a  vertex  PRED[x] 
is  output  such  that  (PRED[x],x)  is  the  last  arc  in  some  shortest  r— >x  dipath. 

begin 

TEMP  :=  V; 
u(r) :=  0; 

for  X  G  V\{r}  do  u(x)  :=  +oo; 

while  there  exists  x  G  TEMP  with  «(x)  <  +oo  do  begin 
select  X  G  TEMP  such  that  u(x)  :=  u(y); 

TEMP  :=  TEMP\{x}; 

for  (x,  y)  €  A  and  y  G  TEMP  do 

if  u(y)  >  u(x)  +  w(x,y)  then  begin 
u(y)  :=  u(x)  +  w(x,y); 

PREDCy]  :=x; 
end 

end 

end 

Theorem  3.5.2  Dijkstra’s  algorithm  is  correct  and  runs  in  time  OdVp). 

Remark:  A  0{\E\  +  IV"!  log2  IVI)  implementation  of  Dijkstra’s  algorithm  is  given 
in  [M.  L.  Fredman  and  R.  E.  Taxjan  (1984),  “Fibonacci  heaps  and  their  uses  in 
improved  network  optimization  algorithms,”  Proceedings  of  the  25th  Annual  IEEE 
Symposium  on  Foundations  of  Computer  Science,  338-346].  See  the  discussion  of 
an  0(m  +  n  logj  n)  algorithm  for  minimum  spanning  trees  at  the  end  of  Chapter  2. 

Proof.  Denote  F  =  V\TEMP.  Note  that  at  any  stage  in  the  algorithm  and  for 
y  ^  r  :  u{y)  =  min{u(x)  +  w{x,y)  :  x  €  F,  (x,y)  G  A},  and  PRED[y]  G  F  if 

«(y)  <  +00- 

The  main  step  of  the  proof  is  to  show  that  at  any  stage  of  the  algorithm  and  for 
any  y  G  F,  u(y)  is  the  length  of  a  shortest  r—*y  dipath.  This  is  trivially  true  initially 
since  F  =  0.  Assume  inductively  that  it  is  true  at  some  stage,  and  suppose  x  is 
the  vertex  selected  for  inclusion  in  F  in  the  next  application  of  the  while.  If  x  =  r, 
u{r)  =  0  is  clearly  the  length  of  some  r^r  dipath;  otherwise,  let  y  =  PRED  [x]  G  F 
and  let  P  be  a  shortest  r— >y  dipath.  Write  P(y,  x)  for  the  dipath  P  with  arc  (y,  x) 
appended.  Then  we  have 

u(x)  =  u{y)  +  w{y,  x)  =  w{P)  +  w{y,  x)  =  w{P{y,  x)) 
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where  the  first  equality  follows  by  the  definition  of  PRED  [x]  and  the  second  by  the 
inductive  hypothesis  on  F.  Thus,  u{x)  is  the  length  of  some  r—*x  dipath. 

Now  suppose  Q  is  another  ? — yx  dipath,  let  z  be  the  last  vertex  of  Q  in  F,  let 
Qi  be  the  r—yz  portion  of  Q,  let  e  be  the  next  axe,  and  let  Q2  be  the  remaining 
portion  of  Q.  Then 

w{Q)  =  w{Qi)  +  10(e)  +  10(^2)  >  u{z)  +  w{e)  >  u(x), 

where  the  first  inequality  follows  by  the  inductive  hypothesis  and  the  assumption 
that  10(^2)  >  0,  and  the  second  inequality  by  the  choice  of  x.  Combining  this  result 
and  the  result  of  the  previous  paragraph  we  conclude  that  u(x)  is  the  length  of  a 
shortest  r—yx  dipath. 

To  complete  the  proof  is  now  straightforward.  The  asserted  property  of  PRED 
in  output  is  implicit  in  the  calculation  of  the  second  paragraph  of  the  proof.  To 
deduce  the  required  properties  of  u  note  first  that  u(x)  =  +00  at  termination  iff 
X  e  TEMP.  Hence,  the  above  proved  property  of  F  yields  the  desired  properties  of 
u(x)  when  u(x)  <  +00.  In  the  remaining  case,  if  u(x)  =  +00  for  some  x,  then 
necessarily  S~(F)  =  0  at  termination,  for  otherwise  u{y)  =  min{u(x)  +  w(x,y)  : 
X  E  F,  (x,  y)  €  A}  <  +00  for  some  y  E  TEMP.  This  completes  the  proof.  [] 

We  close,  as  promised,  with  remarks  on  two  further  special  versions  of  the  SPP. 
Both  are  discussed  in  [6];  the  all  pairs  problem,  the  harder  of  the  two,  is  also 
discussed  in  [9]  (pages  129-133). 

Shortest  Paths  in  Acyclic  Digraphs:  Let  {D,r,w)  be  a  SPP  problem  and  assume  that  D  has  no 
dicircuit.  Such  digraphs  are  called  acyclic.  Since  for  acyclic  digraphs  there  are  a  fortiori  no  negative 
dicircuits  (since  there  are  no  dicircuits  whatsoever!),  it  clear  that  the  SPP  on  such  digraphs  is 
tractable.  Indeed,  it  turns  out  to  be  very  easy.  In  the  standard  algorithms  the  first  step  is  to 
“topologically  sort”  the  vertices  of  D,  that  is,  to  find  an  ordering  vi,...,v„  (n  =  |V^1)  such  that 
(vj,vt)  is  an  arc  only  if  j  <  k.  It  is  then  not  hard  to  see  that  u(vi)  =  0  amd  u(vfc)  =  min{u(t);)  + 
^(vjjVk)  :  j  <  k,{vj,vt)  €  A}  (ife  =  2,  ...,n).  Solving  these  “equations”  in  a  straightforward  way 
gives  the  desired  solution  in  time  0(|jP|).  This  algorithm  is  at  the  heart  of  the  subject  called  critical 
path  scheduling  (the  probabilistic  version  of  which  is  PERT).  The  idea  of  topological  sort  is  also 
useful  in  its  own  right.  For  example,  it  is  used  in  the  UNIX  utility  ‘make’. 

All  Pairs  SPP:  Given  a  digraph  D  with  edge-lengths  w,  the  problem  is  to  find  shortest  x—*y  dipaths 
for  all  patirs  of  vertices  x,y.  Cleaurly,  this  problem  cam  be  solved  directly  by  n  =  jVl  applications  of 
the  Moore- Bellmam  procedure,  once  for  each  of  n  different  choices  of  root.  This  gives  a  worst-caise 
bound  of  0{n^).  There  is,  however,  a  better  method.  Using  am  idea  of  Floyd,  S.  Warshall  showed 
how  to  solve  this  problem  in  time  O(n^),  the  same  ais  for  the  usual  SPP!  The  algorithm  is  not 
difficult.  See  [6]  for  details. 


Exercises 


3.1  Let  D  =  {V,A)  be  a  digraph  with  a  root  vertex  r.  A  rooted  arboresence  of  D  is  a 
spanning  tree  T  such  that  for  every  vertex  x,  the  unique  r-x  path  in  T  is  an  r-^x  dipath. 
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Algorithm.  Simplex  Algorithm  for  Shortest  Paths. 

Input:  An  SPP  {D,t,w)  and  an  arboresence  T  rooted  at  r.  Assume  D  has  no 
negative  dicircuits. 

Output:  An  optimal  solution  of  (3.1)  (same  ais  Ford’s  algorithm), 
begin 

for  I  G  do  u(x)  :=  w(P)  where  P  is  an  r  —  x  dipath  in  T; 
while  u{h{e))  >  u(t(e))  +  w{e)  for  some  e  G  A\T  do  begin 
/  :=  arc  of  T  with  h{f)  =  /i(e); 

T'  :=  component  of  r\{/}  containing  h{e); 

r  ;=  u{h{e))  -  u(t(c))  -  tu(e); 

for  X  G  V{T')  do  u(x)  :=  u(x)  —  P; 

T:=  (T\{/})U{6}; 
end 
end 


Start  with  the  arboresence 


and  show  that  the  above  algorithm  can  take  15  iterations  to  solve  the  shortest  path 
problem. 

(b)  Consider  the  following  digraph  (due  to  J.  Edmonds): 
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Show  that  the  above  algorithm  can  take  3  •  —  2n  —  5  iterations  to  solve  this 

problem.  Deduce  that  it  is  not  possible  to  give  a  polynomial  time  bound  for  Ford’s 
Algorithm. 


Chapter  4 


Introduction  to  Polyhedral 
Combinatorics 

4.1  Introduction 

We  begin  with  a  brief  introduction  to  linear  programming  and  several  of  its  asso¬ 
ciated  polyhedral  concepts.  We  then  consider  the  traveling  salesman  problem,  as 
an  example  of  how  linear  programming  techniques  can  be  applied  to  hard  combi¬ 
natorial  problems.  This  will  serve  to  motivate  some  of  the  ideas  that  come  in  later 
chapters. 

An  excellent  linear  programming  book  is  [4].  For  a  thorough  treatment  of  poly¬ 
hedral  preliminaries  see  Chapters  7  and  8  of  [10];  included  in  these  chapters  axe 
proofs  of  (4.1.2)  and  (4.1.4),  which  are  not  proved  in  these  notes. 

A  linear  program,  or  LP,  is  an  optimization  problem  of  the  form 

min  c^x 

(Primal)  s.t.  Ax  >  b  (4-1) 

I  >  0 

where,  for  some  positive  integers  m  and  n,  c  is  an  n  x  1  column  vector,  6  is  an  m  x  1 
colunrn  vector,  A  is  an  m  x  n  matrix  and  x  is  an  n  x  1  column  vector  of  variables. 
c^x  is  the  objective  function  and  the  inequalities  in  Ax  >  b  axe  the  constraints.  Any 
X  >  0  such  that  Ax  >  6  is  called  feasible,  {x  >  0  :  Ax  >  6}  is  the  feasible  region. 

The  form  of  (4.1)  is,  of  course,  not  the  most  general  form  an  LP  can  take.  We 
could  as  well  consider  maximum  problems  and  mixed  constraints,  including  equality 
as  well  as  inequality  constraints.  One  can  also  consider  variables  with  non-trivial 
upper  and  lower  bounds,  or  variables  without  any  bounds  whatsoever,  so-called  free 
variables. 
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The  dual  of  (4.1)  is  the  LP 


max  b^y 

(Dual)  s.t.  A'^y  <  c  (4.2) 

y  >0 

When  discussing  the  dual,  (4.1)  is  called  the  primal  (as  indicated  in  (4.1)).  A  simple 
but  important  result  for  dual  pairs  of  LPs  is  the  following. 

Theorem  4.1.1  (Weak  Duality  Theorem)  If  x  and  y  are  feasible  solutions  of  (4-1 ) 
and  (4-2),  respectively,  then  c^x  >  b^y.  In  particular,  if  c^x  =  b^y,  then  x  and  y 
are  optimal  solutions  of  (4.1)  and  (4-^))  respectively. 

Proof.  A  straightforward  application  of  the  conditions  of  the  theorem  yields 

>  (y^A)x  =  y^(Ax)  >  y^b  =  6^y.  [] 

There  axe  several  alternative  versions  of  this  theorem  based  on  alternate  forms 
for  the  primal  LP  (and,  hence,  the  dual  LP).  For  example,  if  the  primal  h£is  the  form 
min{c^x  :  Ax  =  ft,  x  >  0},  involving  only  equality  constraints,  then  the  dual  has  the 
form  max{6^y  :  A^y  <  c}  where  the  y  variables  are  free.  Examining  the  proof  of 
weak  duality  suggests  why:  The  second  inequality  in  the  proof  is  an  equality,  and 
so  the  nonnegativity  of  the  y  variables  is  not  needed. 

Much  deeper  than  (4.1.1)  is  the  following  result,  widely  attributed  to  John  von 
Neumann. 

Theorem  4.1.2  (Strong  Duality  Theorem)  If  either  (4-1)  or  (4-^)  has  an  optimal 
solution,  then  both  have  optimal  solutions  and  the  optimal  values  are  equal.  [] 

The  strong  duality  theorem  can  be  proved  constructively  using  G.  B.  Dantzig’s 
simplex  algorithm.  It  can  also  be  proved  using  a  separating  hyperplane  theorem, 
such  as  the  Farkas  Lemma  (see  [10]). 

We  now  introduce  some  of  the  polyhedral  theory  associated  with  LPs.  As  above, 
A  is  an  m  X  n  matrix,  and  x  and  b  axe  (column)  vectors  of  appropriate  dimension. 
A  set  of  the  form  P  =  {x  :  Ax  <  6}  is  called  a  polyhedron.  A  set  C  C  R"  is 
called  a  (convex)  cone  if  for  any  x,y  ^  C  and  any  scalars  a, /3  >  0,  ax  +  /9y  €  C. 
Given  vectors  y^, . . . ,  y*  G  R",  and  scalars  Oi, . . . ,  a*  >  0,  the  linear  combination 
Oiy^  +  ...  +  aky^  of  y^,...,y*  is  called  a  positive  combination.  If,  in  addition, 
Oi  +  . . .  +  Ofc  =  1,  it  is  called  a  convex  combination.  For  X  C  R",  pos  X  denotes 
the  positive  hull  of  X,  the  set  of  all  finite  positive  combinations  of  vectors  in  X,  and 
conv  X  denotes  the  convex  hull,  the  set  of  all  finite  convex  combinations  of  vectors  in 
X.  Clearly  posX  is  a  cone.  If  convX  =  X,  then  X  is  convex.  Cones  and  polyhedra 
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axe  examples  of  convex  sets.  If  X  is  finite,  we  say  that  C  =  posX  is  finitely 
generated.  Sets  of  the  form  conv  X  where  X  is  finite  axe  called  polytopes.  Finally, 
define  a  point  x  in  a  convex  set  X  to  be  an  extreme  point  oi  X  if  x  =  ay  +  (1  —  ct)z 
for  0  <  a  <  1  and  y,z  E  X  implies  x  =  y  =  z.  Thus,  x  is  extreme  if  it  is  not  the 
convex  combination  of  distinct  other  points  in  X. 

Proposition  4.1.3  A  vector  y  G  R"  is  an  extreme  point  of  P  =  {x  :  Ax  <  b}  iff 
y  E  P  and  y  is  the  unique  solution  of  some  subsystem  A'x  =  If  of  Ax  <  b. 

Proof.  First  we  show  the  necessity  of  the  condition.  Let  y  be  an  extreme  point  of 
P,  and  let  A'x  =  b'  be  the  maximal  subsystem  such  that  A'y  =  bf.  Let  A"x  <  b"  be 
the  subsystem  of  remaining  inequalities  in  Ax  <  b.  If  y  is  not  the  unique  solution 
of  A'x  =  b',  then  there  is  a  nonzero  vector  z  such  that  A'z  =  0  (take  z  to  be  the 
difference  of  two  distinct  solutions  of  A'x  =  b').  Since  A"y  <  bf',  and  this  system 
is  finite,  there  is  some  >  0  such  that  Ax^  <  b  (j  =  1,2)  where  x^  =  y  —  fiz  and 
=  y  +  fiz.  But  then  y  =  |x^  +  |x^,  a  contradiction.  Hence  y  is  the  tinique 
solution  of  A'x  =  b',  as  required. 

Now  to  prove  sufficiency,  suppose  y  E  P  and  y  is  the  unique  solution  of  the 
subsystem  A'y  =  b'.  Suppose  y  =  ax  +  (1  —  a)z  where  x,z  E  P  and  0  <  a  <  1. 
Now  we  have 

b'  =  A'y  =  aA'x  +  (1  —  q)A'z  <  ab'  +  (1  —  a)b'  =  b'. 

Hence,  0  <  a  <  1  implies  A'x  —  A'z  =  b',  and  so  y  =  x  =  z,  as  required.  [] 

For  sets  A,B  C  R",  define  A  +  B  =  {a  + b  :  a  E  A,b  E  B),  the  algebraic  smn  of 
A  and  B. 

Theorem  4.1.4  (Faxkas- Minkowski- Weyl)  P  C  R”  is  a  polyhedron  iff 

P  =  conv  X  -1-  pos  Y, 

where  X  and  Y  are  finite  subsets  o/R”.  [] 

It  is  natural  to  ask  when  we  can  put  special  conditions  on  the  vectors  in  X  and 
Y.  Define  the  lineality  space  of  a  convex  set  P,  linP,  by  linP  =  0  if  P  =  0  ,  and 
otherwise  linP  =  {x  :  y  -|-  ax  E  P  for  all  y  €  P  and  a  E  R}.  If  P  =  {x  :  Ax  <  6}  is 
a  polyhedron  then  it  is  easy  to  see  that  linP  =  {x  :  Ax  =  0}.  We  can  also  prove, 
using  (4.1.4),  that  P  has  an  extreme  point  iff  linP  =  {0}.  Clearly  linP  =  {0}  is 
necessary  for  this.  To  see  the  converse,  assume  linP  =  {0},  and  take  y  E  P  such 
that  the  maximal  subsystem  A'y  =  b'  of  Ay  <  b,  satisfied  at  equality,  has  as  many 
rows  as  possible.  If  y  is  not  the  unique  solution  of  A'y  =  b',  and  hence  not  an 
extreme  point,  then  there  is  a  nonzero  x  such  that  A'x  =  0.  As  {x  ;  Ax  =  0}  =  {0}, 
Ax  ^  0,  and  so  for  some  scalar  a,  y  -|-  ax  E  P  satisfies  more  equalities  than  y,  a 
contradiction. 
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Proposition  4.1.5  Let  P  he  a  nonempty  polyhedron.  Then  linP  =  {0}  iff  there 
exist  finite  sets  X  and  Y ,  X  ^  0,  such  that  P  =  conv  JC  +  posK  and  such  that  X 
is  the  set  of  extreme  points  of  P. 

Proof.  If  P  has  an  extreme  point,  then  linP  =  {0}.  This  proves  one  direction  of 
the  proposition.  To  prove  the  the  other,  first  apply  Theorem  4.1.4  to  represent  P 
as  conv  X  +  pos  Y  for  finite  sets  X  and  Y.  Assume  that  X  is  chosen  to  be  minimal. 

Suppose  ir  =  a:  +  y  is  an  extreme  point  of  P,  where  x  6  conv  X  and  y  G  pos  Y.  If 
y  ^  0,  then  z  =  +  |(x  +  2y)  and  x  ^  x  +  2y,  a  contradiction.  Hence,  2:  G  conv  X. 

But  then  obviously  z  ^  X.  It  follows  that  X  contains  all  extreme  points  of  P. 

Now  suppose  z  ^  X  ia  not  extreme  and  consider  the  following  calculation.  Again, 
write  z  =  X  +  y,  where  x  G  convX  and  y  G  posy,  and  assume  that  y  ^  0.  If 
X  G  conv(A'\{z}),  clearly  we  may  delete  z  from  X,  contradicting  the  minimality  of 
X.  Otherwise  x  =  az  +  (1  —  a)x'  where  x'  G  conv(A’\{2r})  and  0  <  a  <  1.  But 
then  z  =  x'  y/(l  —  a),  and  again  we  conclude  that  z  may  be  deleted  from  X. 

Now,  using  the  fact  that  z  is  not  extreme  we  have 

2  =  a{x^  +  fiiy^)  +  (1  -  oi)ix^  +  (4.3) 

where  +  fiiy^  ^  x^  +  ^21/^5  0  <  a  <  1  and  x^  G  conv X,  y’  G  posy,  fij  >0{j  = 
1,2).  By  the  calculation  of  the  previous  paragraph  we  may  assume  afiiy^  +  (1  ~ 
a)fi2y^  =  0,  which  implies  {+ fiiy' ,  —  fiiy'}  Q  posy  (i  =  1,2),  and  so  linP  =  {0} 
implies  fiiy^  =  fi2y^  =  0.  We  conclude  that  z  =  ax^  +  (1  —  and  ^  ^  ^ 
This  will  complete  the  proof  if  we  can  show  that  x^,x^  G  conv(X\{z}).  If  not, 
suppose  say  x^  =  ^2  +  (1  —  where  0  <  <  1  and  x^  G  conv(A’\{z}).  Then 

1  «  _2  ,  -  /^)  _3 

1  —  ap  1  —  ap 

which  is  a  convex  combination  of  x^  and  x^.  Repeating  this  argument  for  x^,  if 
necessary,  completes  the  proof.  [] 

Applying  (4.1.5)  to  Unear  programming,  we  see  that  if  the  feasible  region  of  an 
LP  is  boimded  and  nonempty,  then  it  has  an  extreme-point  optimal  solution. 


4.2  The  Traveling  Salesman  Problem 

We  consider  here  a  formulation  of  the  TSP  that  is  superficially  different  from  that 
given  in  Chapter  1,  but  is  more  convenient  for  our  current  purposes.  Let  Kn  be  the 
complete  digraph  on  n  vertices.  Thus,  Kn  is  a  digraph  with  n  vertices,  labeled  say 
1, . . . ,  n,  and  all  n(n  —  1)  possible  arcs  {i,j)  for  1  <  i,j  <  n,  j.  Given  a  weight 
function  w  defined  on  the  arcs  of  Kn,  the  TSP  may  be  defined  as  the  problem  of 
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finding  a  dicircuit  including  all  vertices  and  having  minimum  total  weight.  As  in 
Example  1.1.2,  we  call  a  dicircuit  including  all  vertices  a  tour. 

Let  us  try  to  formulate  the  TSP  as  an  LP.  We  begin  with  the  following  integer 
linear  program  (ILP): 


min  w'^x 

S.t.  ^jk  —  1 

^jk  —  1 

^jk  e  {0, 1} 


(fc  =  l,...,n) 
(;■  =  l,...,n) 
(all  j,  k) 


(4.4) 


where  the  Xjk  have  the  interpretation  of  selecting  or  not  selecting  the  axe  {j,  k)  when 
Xjk  =  1  or  0,  respectively.  The  first  n  constraints  say  that  a  tour  must  enter  each 
vertex  exactly  once,  and  the  second  n  constraints  that  a  tour  must  leave  each  vertex 
exactly  once.  (4.4)  is  called  an  ILP  because  it  is  an  LP  except  for  the  restriction 
that  all  variables  take  on  integral  values. 
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Figure  4.1:  The  entry  in  row  i  and  column  j  specifies  the  weight,  w{i,j),  of  arc 
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Example  4.2.1  Consider  the  symmetric  weighting  of  Ks  given  in  Figure  4.1.  ILP 
(4.4)  then  hM  20  variables  and  10  constraints,  each  involving  4  variables: 

min  3xi2  +  hxxz  +  2xi4  +  7xi5  +  6x23+ 

4x24  +  X25  +  3x21  +  2x34  +  8X35+ 

5X31  +  6x32  +  3X45  +  2X41  +  4X42+ 

2X43  +  7x51  +  X52  +  8x53  +  3X54 


s.t. 


(4.5) 


2^21  +  2:31  +  X41  +  X51  =  1 

2:12  +  2:32  +  3:42  +  =  1 

3^13  +  2:23  +  X43  +  X53  =  1 
^14  +  ^24  +  ®34  +  *54  =  1 
*15  +  *25  +  *35  +  *45  =  1 

*12  +  *13  +  *14  +  *15  =  1 
*23  +  *24  +  *25  +  *21  =  1 
*34  +  *35  +  *31  +  *32  =  1 
*45  +  *41  +  *42  +  *43  =  1 
*51  +  *52  +  *53  +  *54  =  1 

Xjfc  =  0  or  1  (all  j,  A:)  [] 

The  ILP  (4.4)  is  not  a  correct  formulation  of  the  TSP  because,  while  it  does 
guarantee  that  all  vertices  are  visited,  it  allows  for  a  solution  made  up  of  several 
subtours,  smaller  dicircuits  that  include  only  a  subset  of  the  vertices  (see  the  con¬ 
tinuation  of  Example  4.2.1  below).  Nevertheless,  (4.4)  does  have  one  very  nice 
property.  Even  though  it  is  formulated  as  an  ILP,  rather  than  an  LP,  and  integer 
programming  is,  in  general,  jV'P-haxd,  in  this  case  that  integrality  restrictions  are 
redundant:  We  can  replace  (4.4)  by  its  LP  relaxation,  that  is,  replace  each  of  the 
integrality  restrictions  xjk  G  {0, 1}  by  0  <  xjk  <  1,  and  the  answer  does  not  change. 
We  shall  prove  this  shortly. 

Example  4.2.2  ((4.2.1)  continued)  The  solution  of  the  LP  relaxation  of  (4.5)  is 
*13  =  *25  =  *34  =  *41  =  *52  =  Ij  with  all  other  variables  equal  to  0.  This  solution 
has  total  weight  11,  and,  as  claimed  above,  is  indeed  integral.  Note  however  that  it 
is  made  up  of  two  subtours:  (1,3, 4, 1)  and  (2, 5, 2).  [] 

Before  attacking  the  subtour  elimination  problem,  we  prove  the  above  claimed 
integrality  property  of  the  (4.4)  formulation. 


Proposition  4.2.3  The  LP  relaxation  of  (4-4)  has  an  integral  optimal  solution. 

Proof.  Let  (4.4')  denote  the  LP  relaxation  of  (4.4).  Let  Ax  =  6  be  the  system  of 
equality  constraints  in  (4.4).  We  show  that  A  is  totally  unimodular,  that  is,  that 
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the  determinant  of  every  square  submatrix  of  A  has  value  ±1  or  0.  To  see  that  this 
will  prove  the  proposition,  note  that  the  set  of  feasible  solutions  of  (4.4')  is  clearly 
nonempty  and  boimded  (bounded  because  each  variable  is  individually  bounded, 
and  feasible  because  any  tour  is  a  feasible  solution).  Hence,  by  Proposition  4.1.5,  the 
associated  LP  has  an  optimal  solution  that  is  an  extreme  point.  But  by  Proposition 
4.1.3,  any  extreme  point  of  the  feasible  region  can  be  obtained  by  setting  some 
of  the  Xjk  to  0  or  1,  and  solving  the  resulting  subsystem  of  Ax  =  b  uniquely  for 
the  remaining  variables.  Cramer’s  rule  together  with  the  total  unimodularity  of  A 
implies  that  this  solution  must  be  integral. 

To  prove  the  total  unimodularity  claim,  assume  it  is  false  and  let  H  be  a  minimal 
square  submatrix  of  A  with  determinant  other  than  ±1  or  0.  Note  that  the  rows  of 
A  partition  into  two  sets  such  that  every  column  in  each  set  contains  exactly  one 
1,  and  otherwise  Os.  This  partition  induces  a  partition  of  the  rows  of  B  such  that 
each  column  has  at  most  one  1  in  each  set.  If,  in  fact,  every  column  has  exactly 
one  1  in  each  set,  then  summing  the  rows  in  each  part  of  the  partition  we  obtain  a 
vector  of  all  Is.  Hence,  B  is  singular  and  has  determinant  0,  a  contradiction. 

Similarly,  there  can  be  no  column  of  all  Os.  Hence,  there  is  a  column  with 
exactly  one  1.  But  then  expanding  the  determinant  of  B  on  this  column,  we  obtain 
det  H  =  ±  det  B',  where  B'  is  a  proper  submatrix  of  B.  This  is  a  contradiction, 
since  by  the  minimality  of  B,  det  B'  is  ±1  or  0.  [] 

The  idea  of  the  above  proof,  that  of  using  total  unimodularity  to  show  that  a 
particular  ILP  can  be  solved  as  an  LP  is  due  to  A.  Hoffman  and  J.  B.  Kruskal  (1956) 
[“Integral  boundary  points  of  convex  polyhedra,”  in  Linear  Inequalities  and  Related 
Systems,  H.  W.  Kuhn  and  A.  W.  Tucker,  eds.,  Princeton  Univ.  Press,  Princeton, 
N.  J.,  223-246]. 

Let  us  now  turn  to  the  problem  of  eliminating  subtours.  C.  E.  Miller,  A.  W. 
Tucker  and  R.  A.  Zanlin  (1960)  [“Integer  programming  formulations  and  traveling 
salesman  problems,”  Journal  of  the  Association  of  Computing  Machinery  7  326-329] 
proposed  the  following  clever  formulation; 

min  w'^x 

s.t.  Zk:^iXjk  =  l  (j  =  l,...,n) 

Ej^ifca:ifc  =  l  (fc  =  l,...,n)  (4.6) 

Uj  —  Uk  +  nxjk  <  n  —  1  {2  <  j ,  k  <  n,  j  ^  k) 

Xjk  G  {0, 1}  (all  j,  k,j  k) 

We  leave  it  to  the  reader  to  verify  that  this  formulation  does  indeed  eliminate 
subtours,  and  is  thus  a  correct  ILP  formulation  of  the  TSP.  This  would  seem  to 
imply  that  we  have  in  fact  found  a  acceptable  formulation  of  the  TSP.  However, 
there  is  a  difficulty.  While  (4.6)  is  a  correct  ILP  formulation  of  the  TSP,  it  is  also  a 
more  typical  ILP  than  (4.4)  in  that  it  does  not  have  the  integrality  property  proved 
in  (4.2.3).  Obviously  the  new  constraint  matrix  is  not  totally  unimodular  (it  doesn’t 
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even  have  entries  restricted  to  {0,  ±1}),  and  so  certainly  the  idea  we  apphed  to  (4.4) 
cannot  be  applied. 


Example  4.2.4  ((4.2.1)  continued)  (4.6)  adds  12  constraints  to  (4.5).  An  optimal 
solution  to  the  associated  LP  relaxation  is  Xi3  =  1,  X2s  =  0.6,  X21  =  0.4,  X34  =  1, 
^45  =  0.4,  X41  =  0.6,  X52  =  1,  U2  =  1-0,  U4  =  2.0,  which  has  total  weight  12.2.  □ 


In  an  attempt  to  more  clearly  understand  the  difficulties  that  arise  in  (4.4)  and 
(4.6),  we  consider  a  “polyhedral  approach.”  To  this  end,  let  A  be  the  set  of  arcs  of 
Kn.  For  I  Q  A  define  the  incidence  or  characteristic  vector  of  /,  x^,  by  x^(a)  =  1 
if  a  G  7  and  x^(a)  =  0  if  a  G  A\I.  Let  I  be  the  family  of  subsets  of  axes  of  A  that 
form  tours.  Let 

Ptsp  =  conv{x^  :  7  G  J}. 


Then  it  is  easy  to  see  that  solving  min{u>^x  :  x  G  Ptsp}  solves  the  TSP  on  Kn  for 
a  given  weighting  w:  If  x*  is  an  optimal  solution  of  this  optimization  problem,  then 
X*  G  Ptsp  implies  x*  is  a  convex  combination  of  a  finite  set  of  incidence  vectors  of 
tours,  X*  =  Ylj&J  ^3^^’  •  w^x*  <  w'^x^i  (j  G  J),  then  clearly 


w^x*  =  Y^ajW^x^^  >  '^ajw^x*  =  iw^x*, 
ieJ  i€J 

a  contradiction.  Hence,  some  7,  {j  G  J)  is  optimal. 

Denote  the  feasible  regions  of  the  LP  relaxations  of  (4.4)  and  (4.6)  by  Pa  and 
Pi,  respectively.  Clearly,  both  Pa  and  Pi  contain  Ptsp',  the  difficulty  is  that  both 
properly  contain  Ptspi  and  that  they  axe  not  good  enough  approximations.  The 
following  formulation  provides  a  better  approximation.  It  was  suggested  in  a  seminal 
paper  by  G.  B.  Dantzig,  D.  R.  Fulkerson  and  S.  M.  Johnson  (1954)  [“Solution  of  a 
large  scale  traveling  salesman  problem,”  Operations  Research  2  393-410]: 


min  w^x 
s.t.  Zkjij  Xjk  =  1 
^jk  1 

a:(^"(A:))  >  1 

Xjk  G  {0,1} 


(i  =  l,...,n) 
(fc  =  l,...,n) 
{QCXCV) 
(all  j,kj  ^  k) 


(4.7) 


Note  that  the  constraints  x(^  (A^))  >  1  do  eliminate  subtours  since  if  x  is  integral 
and  X  is  the  vertex-set  of  some  subtour  determined  by  x,  then  x(S~(X))  =  0. 

At  first  (4.7)  would  seem  to  be  less  useful  than  (4.6),  since  the  number  of  subtour 
elimination  constraints  is  now  expenential,  2"— 2.  For  any  reasonably  large  problem, 
(4.7)  cannot  even  be  written  down.  However,  in  their  paper  Dantzig,  FuUcerson  and 
Johnson  found  that  they  could  “generate”  these  constraints  as  they  needed  them 
by  using  a  network-flow  technique.  In  this  way  they  were  actually  able  to  solve 
by  hand  the  48-city  TSP  (n  =  48)  corresponding  to  traveling  through  all  the  state 
capitals  in  the  continental  US. 
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Example  4.2.5  ((4.2.1)  continued)  Formulation  (4.7)  adds  2®  —  2  =  30  subtour 
elimination  constraints  to  (4.6).  The  optimal  solution  to  the  LP  relaxation  is 
Xi2  =  X25  =  X31  =  X43  =  X54  =  1  with  total  weight  14.  Since  the  solution  is 
integral,  and  contains  no  subtours,  it  must  be  optimal  for  the  original  TSP.  Indeed, 
a  more  detailed  examination  of  the  associated  LP  solution,  and  in  particular  the 
optimal  values  of  the  dual  variables  (not  shown  here,  but  provided  by  any  standard 
solution  technique)  reveals  even  more  information:  Most  of  the  dual  variables  cor¬ 
responding  to  subtour  elimination  constraints  are  0,  and  hence  these  constraints  are 
not  binding  on  the  solution — they  can  be  deleted.  Indeed,  only  one  of  the  additional 
30  constraints  is  needed  to  produce  the  desired  optimal  solution,  X  =  {1, 3,4}  : 

^12  +  Xi5  -)-  X35  -|-  X32  -f  X45  -|-  142  >  1  (4.8) 

Note  that  this  constraint  explicitly  eliminates  the  subtours  (1,3,4, 1)  and  (2,5,2) 
found  in  solving  (4.5).  Thus,  we  could  have  found  it  by  first  solving  (4.5)  and  then 
examining  the  solution. 

If  one  writes  out  the  dual  of  (4.5)  with  (4.8)  appended,  it  is  readily  checked  that 
y^  =  [6  1834  — 3  — 3  — 1  — 60  5]  is  feasible.  Note  in  particular  that  y  has 
11  components,  the  first  10  corresponding  to  the  constraints  of  (4.5)  and  the  last 
corresponding  to  (4.8).  It  also  has  some  negative  components,  permissible  since 
the  LP  primal  has  equality  constraints  corresponding  to  these  components.  Finally, 
note  that  the  sum  of  the  coordinates  of  y  is  14.  This  proves,  using  weak  duality, 
that  14  is  a  minimum- weight  tour.  [] 

4.3  An  Exact  Defining  System  for  Ptsp^ 

The  previous  section  is  meant  to  illustrate  how  polyhedral  and  techniques  might 
be  used  to  solve  hard  combinatorial  problems.  Contained  in  the  section  are  the 
beginnings  of  what  has  proved  to  be  a  very  successful  “deep-cut  method”  for  the 
solution  of  combinatorial  problems.  Apart  from  its  purely  illustrative  value,  there 
are  also  several  important  theoretical  ideas  contained  in  this  development.  First,  the 
idea  of  finding  a  good  approximation  to  the  convex  hull  of  the  desired  combinatorial 
solutions — in  the  case  of  the  TSP,  a  good  approximation  to  the  polyhedron  Ptsp] 
the  idea  of  being  able  to  generate  this  approximation  as  it  is  needed,  without  having 
to  explicitly  write  down  all  constraints  (the  way  in  which  (4.8)  could  be  found  from 
the  solution  to  (4.5));  finally,  the  idea  that  when  this  approach  succeeds,  it  not  only 
provides  a  solution,  but,  via  LP  weak  duality,  a  proof  of  optimality.  The  two  last 
points  came  up  first  when  considering  the  formulation  (4.7). 

Let  us  now  consider  some  of  these  issues  in  a  more  general  context.  It  follows 
from  the  Farkas-Minkowski-Weyl  Theorem,  (4.1.4),  that  there  does  exist  an  exact 
description  of  Ptsp  in  the  form  {x  :  Ax  <  b},  but  can  we  actually  find  A  and  b  in 
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practice?  More  precisely,  can  we  “find”  an  A  and  h  that  describe  Ptsp  and  are  not 
too  complicated?  For  example,  we  certainly  don’t  want  A  to  contain  coefficients 
with  too  many  significant  digits,  so  that  even  specifying  an  entry  is  in  itself  a 
laborious  procedure. 

Considerations  in  theoretical  computer  science  seem  to  suggest  that  the  problem 
of  finding  such  an  A  will  be  difficult,  for  this  would  imply  that  the  TSP  has  a  good 
proof  of  optimality,  and  hence  is  in  the  class  MV  fl  co-AfV.  To  see  this  we  apply 
LP  duality  theory,  as  described  below 

Consider  the  LP  min{tw^x  :  Ax  <  6,x  >  0},  where  Ptsp  =  {x  :  Ax  <  6};  the 
added  nonnegativity  conditions  x  >  0  clearly  do  not  change  Ptsp-  The  dual  of  this 
problem  has  constraints  A^y  >  w,  y  >  Q.  Now  the  primal  clearly  has  an  optimal 
solution  since  the  incidence  vector  of  any  optimal  tour  is  also  optimal  for  the  LP. 
Let  X*  be  the  incidence  vector  of  such  an  optimal  tour.  Then  by  strong  duality, 
(4.1.2),  there  is  an  optimal  dual  solution  y*,  and  w^x*  =  h^y*.  Now  to  “prove 
optimality”  we  apply  weak  duality;  thus,  we  must  show  how  it  can  be  quickly 
checked,  in  polynomial  time,  that  Ax*  <  &,  x*  >  0,  A^y*  >  w,  and  y*  >  0.  The 
matrix  A  is  likely,  in  fact,  certain,  to  be  huge,  having  exponentially  many  rows. 
Hence,  it  is  not  a  priori  clear  how  this  checking  is  to  be  done.  However,  as  x*  is  the 
incidence  vector  of  a  tour,  x*  >  0  is  evident,  and  Ax*  <  fe  we  can  check  implicitly. 
After  all,  if  we  have  a  theorem  that  claims  Ptsp  =  {x  :  Ax  <b},  and  in  particular 
that  Ptsp  is  a  subset  of  this  latter  polyhedron,  then  it  suffices  simply  to  show  that 
X*  is  the  incidence  vector  of  a  tour!  This  takes  time  0(n). 

Finally,  consider  y*.  Here  the  problem  is  somewhat  more  difficult.  Because 
A  has  a  large  number  of  rows,  y*  could  in  principal  have  a  large  number  nonzero 
components,  and  thus,  be  itself  non-polynomial  in  size.  However,  this  difficulty  can 
be  avoided,  for  we  can  assume  that  y*  has  been  chosen  to  be  an  extreme  point  of 
the  dual  feasible  region,  {y  >  0  :  A^y  <  «?}.  This  follows  from  Proposition  4.1.5 
since  y  >  0  prevents  this  set  from  having  a  nontrivial  lineality  space.  Suppose  that 
A  is  AT  X  n,  where,  as  we  have  noted,  K  can  be  large.  Then  by  Proposition  4.1.3, 
y*  is  the  unique  solution  of  some  subsystem  {A'Yy  =  w\  yj  =  0  (j  €  J).  Since  y 
has  K  components,  this  system  must  include  at  least  K  equations,  or  the  rank  of 
the  system  cannot  be  large  enough  to  force  a  unique  solution.  But  if  the  system 
has  at  least  K  equations,  then  \  J\>  K  —  n,  since  A  has  n  columns.  Hence,  y  has  at 
most  n  positive  components.  It  follows  that  checking  y*  ^  0  is  straightforward,  as 
is  checking  A^y*  <  w,  the  latter  computation  taking  time  at  most  O(n^).  Finally, 
we  need  to  verify  lu^x*  =  6^y*,  which  is  trivial  (given  that  we  know  y*  does  not 
have  too  many  nonzero  coordinates),  and  the  proof  is  complete. 

The  above  arguments  suggest  that  a  good,  exact  description  of  Ptsp  is  unlikely 
to  be  found.  This  is  a  negative  result.  However,  the  practical  implications  of  the 
above  ideas,  particularly  the  examples  of  the  previous  section,  are  meant  to  be 
positive.  They  illustrate  that  even  though  a  complete  understanding  of  Ptsp  is 
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desired,  it  is  not  necessarily  needed  to  get  optimality.  Indeed,  we  have  proved  that 
there  ore  always  a  small  number  of  constraints  that  suffice,  for  a  given  w,  to  get  an 
optimal  solution  and  prove  its  optimality.  What  we  have  not  shown  is  how  to  find 
these  constraints,  and  that  the  constraints  themselves  are  necessarily  easy  to  write 
down. 


Exercises 


4.1  Show  that  the  fornmlation  of  the  TSP  given  by  (4.6)  is  valid.  That  is,  show  that  an 
integral  x  satisfies  the  constraints  of  (4.6)  only  iff  x  is  the  incidence  vector  of  a  tour. 

4.2  The  recession  cone  of  a  convex  set  P,  rec  P,  is  defined  by  rec  P  =  {y  :  y+i  6  P  Vx  6  P}. 
Show  that  a  closed  convex  set  P  is  unbounded  iff  rec  P  contains  some  nonzero  vector.  (Can 
you  give  an  example  showing  that  this  is  false  if  P  is  not  closed?) 

I  will  accept  a  proof  that  this  is  true  for  polyhedra,  which  is  simpler.  A  proof  for  general 
closed  convex  sets  seems  to  be  an  exercise  in  analysis. 

4.3  Show  that  dim  Ptsp  =  —  3n  +  1.  (Hint;  The  first  step  is  to  use  the  constraints  of 

(4.2.3)  to  show  that  dim  Ptsp  <  -  3n  +  1.) 
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Chapter  5 

Facets  of  Polyhedra 


5.1  Introduction 

In  this  chapter  we  study  the  polyhedron  of  the  minimum  spanning  tree  problem 
and  give  a  complete  description.  A  key  idea  is  the  notion  of  a  “facet,”  or  “maximal 
face.”  For  hard  combinatorial  problems,  where  finding  a  complete  description  of 
the  underlying  polyhedra  is  expected  to  be  hard  (see  the  last  section  of  Chapter 
4),  finding  theoretical  descriptions  of  facets  is  a  crucial  ingredient  in  developing 
good  computational  procedures.  In  the  context  of  integer  programming,  facets  are 
sometimes  czdled  deep  or  strong  cuts. 

In  Chapter  6  we  discuss  the  ellipsoid  method  and  prove  a  result  of  Grotschel, 
Lovaaz  and  Schrijver  showing  that  polynomial-time  optimization  is  equivalent  to  the 
existence  of  a  polynomial-time  algorithm  for  finding  separating  hyperplanes,  given 
the  ellipsoid  method.  This  fact  yields  the  only  known  polynomial-time  algorithms 
for  several  important  combinatorial  problems.  It  also  provides  a  general  theoretical 
basis  for  the  polyhedral  approach  to  CO  problems:  It  implies  that  if  a  complete 
description  of  a  polyhedron  can  be  foimd,  and  the  separation  problem  can  be  solved, 
then  optimization  is  possible  in  polynomial  time  over  that  polyhedron. 


5.2  More  Polyhedral  Preliminaries 

We  need  a  notion  of  dimension  for  polyhedra,  from  which  come  the  tools  to  deal 
with  facets. 

An  affine  combination  of  vectors  x-'  €  R"  (j  =  1, . . . ,  n)  is  a  linear  combination 
of  the  form  cnix^  -|-  . . .  -t-  ockX^ ,  where  ai  -f  . . .  -|-  Offc  =  1.  Thus,  an  affine  combination 
is  a  convex  combination  in  which  some  coefficients  may  be  negative.  For  X  C  R", 
the  affine  hull  of  X,  aff  X,  is  defined  to  be  the  set  of  all  finite  aifine  combinations 
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of  vectors  in  X.  X  is  said  to  be  affine  iiaSX  =  X.  For  a  e  R",  X  +  a  denotes  the 
set  {ar  +  a  :  X  €  X}. 

Proposition  5.2.1  X  is  affine  iff  X  =  S  +  a  for  some  linear  sub  space  S  and  some 
vector  a;  indeed,  if  X  is  affine  then  X  —  a  is  a  linear  subspace,  the  same  linear 
subspace  for  any  a  €  X. 

Proof.  Suppose  X  =  S  +  a,  where  5  is  a  linear  subspace.  Then  an  affine  combi¬ 
nation  X  =  vectors  in  X  satisfies  x  =  a  -f-  —  a)  E  X,  since 

X)  Oj  =  1,  X'^  —  <2  6  5  (j  €  J)  and  5  is  closed  under  linear  combinations. 

To  prove  the  converse,  let  X  be  affine  and  let  a  €  X.  We  show  that  S  =  X  —  a 
is  a  subspace.  For  a  linear  combination  s  =  —  a)  of  vectors  in  .Y  —  a,  we 

have 

5  ^  ajx^  +  (1  “  51  —  a  £  X  —  a 

j€*/  i€*/ 

since  X)  Q!_,  -h  (1  —  53  aj)  =  1,  a  €  .Y  and  X  is  closed  under  affine  combinations.  [] 

If  is  a  convex  set,  we  define  the  dimension  of  X,  dim  JC,  to  be  the  dimension 
of  the  subspace  (aff  X)  —  a,  where  a  G  X. 

Now,  let  P  =  {x  :  Ax  <  6}  be  a  polyhedron.  Let  A*x  <  b’  be  the  subsystem 
of  Ax  <  b  such  that  A^x  =  b’‘  for  all  x  €  P.  The  equations  A“x  =  b”  are  called 
implicit  equations  of  P.  The  remaining  inequalities  in  the  system  Ax  <  b  are 
denoted  A''’x  <  b*.  Note  that  if  P  ^  0,  then  A'''x  <  b*  for  some  x  G  P  (if  A  =  A“ 
we  tahe  this  to  be  vacuously  true,  and  otherwise  take  a  proper  convex  combination 
of  a  set  of  vectors  such  that  for  each  of  the  inequalities  in  A'''x  <  b*  one  of  the 
vectors  in  the  set  satisfies  this  inequality  strictly).  The  next  proposition  implies 
that  {x  :  A“x  =  6'}  is  independent  of  the  choice  of  A  and  b,  depending  only  on  P. 

Proposition  5.2.2  If  P  ^  Q  ,  then  aif  P  =  {x  :  A'x  =  6^}. 

Proof.  Suppose  z  =  aiz^  OkZ^,  where  z-'  G  P  (j  =  1, . . . ,  k)  and  Oi  -|-  . . .  -|- 

Ofe  =  1.  Then  by  the  definition  of  A“,b^  we  have  A^Z'’  =  6"  for  each  j,  and  so 

it  k 

A°z  =  ^  ajA‘z^  =  ^2 

j=i  j=i 

This  proves  aif  P  C  {x  ;  A“x  =  6“}. 

To  prove  the  converse,  let  z  G  {x  :  A“x  =  b~},  and  let  y  G  P  be  such  that 
A+y  <  b^ .  Now  for  a  >  0,  define  y“  =  y  -|-  a{z  —  y).  Note  that 


A“y"  =  A“y  -I-  a(A“z  —  A  y)  =  6“  -I-  a{b~  —  b~)  =  b~\ 
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moreover,  for  the  remaining  inequalities  in  Ax  <  b -we  have  A+y"  =  A'^  y  +  a{A'^  z  — 
A+y).  Hence,  A+y  <  b'*'  implies  that  for  sufficiently  small  a  >  0,  A+y"  <  6+,  and  so 
y“  €  P.  But  2  =  iy“  +  (1  —  ^)y,  and  so  z  6  aff  P.  [] 

The  above  result  implies  that  {®  :  A“a;  =  6*}  is  independent  of  the  choice  of 
A,  b.  It  also  implies: 

Corollary  5.2.3  dimP  =  n  —  rank  A“.  [] 

We  next  define  face  and  facet.  A  face  of  a  polyhedron  P  is  a  nonempty  set  of 
the  form  P  =  {x  6  P  :  a^x  =  /?}  where  a^x  must  be  a  valid  inequality  for  P,  that 
is,  P  C  {x  :  a^x  <  The  hyperplane  {x  :  a^x  =  13}  is  then  called  a  supporting 
hyperplane  of  P.  A  fax:e  P  of  P  is  proper  if  F  P.  A  maximal  proper  face  is  a 
facet. 

We  have  the  following  relationship  between  the  faces  of  a  polyhedron  and  any 
representing  set  of  inequalities. 

Proposition  5.2.4  F  is  a  face  of  a  polyhedron  P  =  {x  :  Ax  <  b}  iff  F  0  and 
P  =  {x  €  P  :  A'x  =  6'}  for  some  subsystem  A'x  <  V  of  Ax  <  b. 

Proof.  For  one  direction  of  the  proof  let  P  =  {x  6  P  :  A'x  =  6'}.  Define  a  —  e^A' 
and  =  e^6',  where  e  is  a  vector  of  all  ones.  Then  clearly  {x  6  P  :  oFx  =  ^}  =  P, 
since  x  €  P  (that  is.  Ax  <  b)  implies  e^A'x  =  e^V  iff  A'x  =  b'.  Hence  P  is  a  face. 

The  proof  of  the  converse  is  a  bit  more  tedious.  Let  P  =  {x  G  P  :  a^x  =  /3}  be 
a  face,  and  let  A'x  <  6'  be  the  set  of  all  inequalities  of  Ax  <  b  satisfied  at  equality 
by  every  x  €  P.  Let  P'  =  {x  G  P  :  A'x  =  6'}.  Clearly  P  C  P'.  Suppose  there  is  a 
vector  y  G  P'\P.  Let  A"x  <  b"  be  the  set  of  inequalities  from  Ax  <  b  not  included 
in  A'x  <  V .  We  may  assume  that  there  is  a  z  G  P  such  that  A"z  <  6".  This  follows 
from  the  convexity  of  P  and  the  choice  of  A'  and  b' .  Let  z"  =  z  +  a(y  —  z).  Since 
A"z  <  b"  and  A'z  =  6',  it  follows  that  z°‘  G  P  for  all  a  sufficiently  near  zero.  But 
af'{y  —  z)  ^  0,  and  so  a^z“  >  0  for  some  such  a,  contradicting  the  fact  that  a^x  <  ^ 
is  valid.  This  contradiction  completes  the  proof.  [] 

The  above  result  implies  that  P  has  only  a  finite  number  of  faces.  The  next 
result  characterizes  facets  in  terms  of  a  defining  system  of  inequalities.  Call  a 
defining  system  irredundant  if  every  inequality  is  essential,  that  is,  if  the  removal 
of  any  inequality  from  the  system  enlarges  the  polyhedron. 

Theorem  5.2.5  Let  Ax  <  b  be  an  irredundant  defining  system  for  a  polyhedron  P. 
Then  F  is  a  facet  of  P  iff  F  =  {x  P  :  d^x  =  /?}  for  some  row  [a^  /3]  of  [A'*’  b^]. 

Proof.  To  prove  one  half  of  the  theorem,  let  [a^  be  a  row  of  [A'*'  b'^]  and  let 
F  =  {x  E  P  :  d^x  =  13}.  Clearly  P  is  a  proper  face  since  none  of  the  inequalities 
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in  A'*'x  <  b*  is  implicit.  To  show  that  F  is  maximal,  let  G  P  be  such  that 
<  b*.  By  irredundancy  there  is  an  x^  such  that  A’‘x^  =  b~,  A'x^  <  b'  and 
a^x^  >  /3  ,  where  .4'x  <  b'  is  the  system  A*x  <  b^  with  a^x  <  (3  removed.  But 
then  for  some  convex  combination  x^  of  x^  and  x^  we  have  .4”x^  =  A'x^  <  b' 
and  a^x^  =  /?,  which  implies  that  F  is  maximal.  In  particular,  by  (5.2.4),  if  there 
were  a  proper  face  containing  F  it  would  be  given  by  some  of  the  inequalities  from 
Ax  <  6,  and  we  have  just  shown  that  every  such  inequality  that  is  not  implicit,  and 
is  different  from  a^x  <  excludes  some  point  of  F,  namely  x®. 

For  the  converse  let  P  be  a  facet.  Then  by  (5.2.4),  P  =  {x  G  P  :  A'x  —  b'}  for 
some  subsystem  of  A'^'x  <  b*.  Taking  one  of  the  equalities  from  .4'x  =  b'  suffices, 
since  none  are  implicit.  [] 

Corollary  5.2.6  If  F  is  a  face  of  P,  then  F  is  a  facet  iff  dimF  =  dimP  —  1. 

Proof.  Clearly  two  affine  sets,  one  containing  the  other  and  both  with  the  same 
dimension,  are  equal,  because  subspaces  of  equal  dimension  are  equal;  moreover,  if 
Pt  and  P2  are  two  faces  of  the  same  polyhedron,  then  Pi  =  Pj  iff  affPi  =  aff  Pj. 
Hence,  dimP  =  dimP  —  1  for  a  face  P,  implies  P  is  a  facet. 

To  prove  the  converse  let  P  =  {x  G  P  :  a^x  =  /?}  where  [a^  j3]  is  a  row 
of  [A  b]  and  Ax  <  6  is  an  irredundant  system  determining  P — we  may  assume 
P  has  this  form  by  (5.2.5).  Now  clearly  P  =  {x  :  Ax  <  b,alx  >  and  by 
the  proof  of  (5.2.5),  the  only  implicit  equalities  in  this  system  are  A“x  =  b"  and 
a^x  —  l3  (x^  from  the  proof  of  (5.2.5)  shows  this).  Hence,  by  Corollary  5.2.3, 
dimP  =  n  —  rank  A“  —  1  =  dimP  —  1.  [] 

Corollary  5.2.7  Suppose  F  is  a  facet  o/P  =  {x  ;  Ax  <  b}  and 

P  =  {x  G  P  :  a^x  =  ^}  =  {x  G  P  :  =  /?} 

where  gffx  <  ^  and  cFx  ^  are  valid.  Then  there  exists  a  vector  z  and  a  scalar 
a  >  0  such  that  [a^  ^  =  a[a^  0\-\-  2^[A“  b°]. 

Proof.  Let  [A  6]  and  [A  6]  be  [A=  6“]  with  [a^  and  ^],  respectively,  appended 
as  the  last  row.  Then  dim  P  =  dim  P  —  1  implies  aJf  P  =  {x  ;  Ax  =  6}  =  {x  ;  Ax  = 
6),  since  both  of  the  latter  sets  contain  P,  both  are  affine,  and  both  have  dimension 
less  than  dim{x  :  A“x  =  6“}  =  dimP.  This  proves  that  [a  ^  is  represented  as 
claimed  in  the  corollary,  where  a  ^  0.  To  prove  a  >  0,  let  x  G  P  be  such  that 
o^x  <  Then  a  <  0  implies  a^x  =  aa^x-\-z^A=x  >  a^+z'^h"  =  §_,&  contradiction. 
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5.3  A  Minimum- Spanning- Tree  Polyhedron 

We  saw  at  the  end  of  the  previous  chapter  that  finding  a  complete  description  of 
the  polytope  corresponding  to  the  traveling  salesman  problem  is  hkely  to  be  very 
difficult.  In  this  section  we  show  that,  on  the  contrary,  such  a  result  can  be  obtained 
for  the  polyhedron  associated  with  the  minimum  spanning  tree  problem. 

Let  G  be  an  undirected  graph  and  let  I  be  the  family  of  edge-sets  of  forests  of  G. 
Define  Pmst  =  conv{a:‘^  :  J  €  J}.  We  call  Pmst  the  minimum-spanning-tree,  MST, 
polyhedron.  The  following  theorem  gives  a  complete  description  of  Pmst-  The  proof 
is  derived  from  a  proof  for  the  polyhedron  of  “independent  sets  of  a  matroid”  given 
in  [5]. 


Xheorem  5.3.1  Let  G  =  {V,E)  be  a  graph  and  let  Eq  =  {e  E  E  :  e  is  a  loop}. 
Then  Pmst  for  G  is  the  set  of  solutions  of  the  following  system  of  inequalities: 

(i)  x(E{W))  <  |W|  -  1  ViVCV, \W\  >  2; 

(ii)  x(e)  >0  'ie  e  E; 

(Hi)  x(e)  =  0  Ve  e  Eq. 

Remark'.  E{W)  denotes  the  set  of  edges  in  G  with  both  ends  in  W.  Similarly,  for 
a  subset  of  edges  A  of  G,  V(A)  denotes  the  set  of  vertices  incident  to  A.  Note  that 
(i)  trivially  holds  if  |W|  =  1. 

Proof.  Let  P  denote  the  solution  set  of  (i,ii,iii).  First  we  show  Pmst  S  P-  Since  P 
is  convex,  it  suffices  to  show  that  x^  satisfies  (i,ii,iii)  for  each  edge-set  J  of  a  forest. 
Obviously  (ii)  holds,  and  (iii)  holds  because  no  loop  can  be  in  any  forest.  To  see 
that  (i)  holds  let  W  C  V(G)  and  let  Wi, . . . ,  Wk  be  the  vertex-sets  of  the  connected 
components  of  /  n  E(W).  Then  we  have 

x^{E{w))  =  |Jn^;(w)| 

=  \inEiWi)\  +  ...  +  \inE(Wk)\ 

=  \Wi\-l  +  ...  +  \Wk\-l 

<  \w\-i, 

where  the  third  equality  follows  because  I  D  E(Wj)  is  a  tree  (j  =  1, . . . ,  Jb)  (see 
2.1.1).  This  proves  Pmst  ^  P- 

To  show  the  converse,  P  C  Pmst,  we  show  two  things:  (a)  that  aJF  Pmst  = 
{x  :  x(e)  =  0,  e  G  .Eo),  and  (b)  that  any  facet  of  Pmst  is  a  nonnegative  multiple 
of  some  inequality  of  type  (i)  or  (ii)  plus  a  linear  combination  of  equations  of  type 
(iii).  Since  by  the  Farkas-Minkowski-Weyl  Theorem,  (4.1.4),  Pmst  is  the  solution 
set  of  some  system  of  inequalities,  and  hence  of  an  irredundant  system  (in  the  sense 
of  the  previous  section),  it  follows  from  Theorem  5.2.5  and  the  above  claims  (a)  and 
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(b),  once  they  have  been  proved,  that  any  x  E  P  must  satisfy  all  the  inequalities  of 
any  such  irredundant  defining  system,  and  hence  that  x  €  Pmst- 

First  we  prove  (a).  The  equations  (iii)  are  valid  for  Pmst  (as  noted  in  the 
first  paragraph  of  the  proof)  and  are  linearly  independent.  Hence,  dim. Pmst  < 
\E\  —  |£o|-  But  Pmst  contains  the  0- vector  and  the  linearly  independent  vectors 
:  e  E  E\Eo},  and  so  dimP^fST  >  —  l^^ol-  Hence,  equality  holds.  This 

proves  (a). 

To  prove  (b),  let  Fa  =  {x  E  P  :  a^x  =  /3}  be  a  facet  of  Pmst  and  let  la  = 
{I  E  T  :  x^  E  Fa}.  We  may  assume  that  aj  =  0  for  j  €  Eq  since  scalar  multiples 
of  the  valid  equations  x{e)  =  0  (e  6  Eq)  may  be  subtracted  from  a^x  =  without 
changing  Fa-  There  are  now  two  cases  to  consider: 

Case  1.  Suppose  <  0  for  some  j.  Now  if  j  El  for  some  I  G  Ta,  then 
/3  >  =  a^x^  —  Oj  >  ^,  a  contradiction.  It  follows  that  x  E  Fa 

implies  Xj  =  0,  since  by  the  definition  of  Pmst,  Fa  =  conv{x^  :  I  E  Jo}- 
Hence  Fa  Q  Fj  =  {x  E  P  :  Xj  =  0}.  But  Fj  is  a  proper  face  of  Pmst  since 
j  E  E\Eq.  Hence,  by  the  maximality  of  facets.  Fa  =  Fj.  Since  ajt  =  0  for 
k  E  Eq,  Corollary  5.2.7  implies  that  a^x  <  ^  is  a,  positive  multiple  of  the 
inequality  —xj  <  0. 

Case  2.  Suppose  a  >  0  and  let  A  =  {j  :  aj  >  0}.  We  claim  that  every  I  ETa 
is  the  edge-set  of  a  maximal  forest  in  A.  Suppose  not.  Let  I  E  Ta  he  such 
that  |J|  <  m,  where  m  is  the  size  of  a  maximal  forest  in  A.  Extend  /  to  a 
maximal  forest  I  Li  K  of  A.  Then  we  have 

p  >  a  X  >  a  X  —  p, 

a  contradiction.  Hence  Fa  is  a  subset  of  the  face  F  =  {x  E  P  '.  x{A)  =  m}; 
moreover,  P  is  a  proper  face  since  .4  is  a  nonempty  subset  of  E\Eo.  It 
follows  that  Fa  =  F,  and  so  Ofe  =  0  for  k  E  Eq  implies  that  a^x  <  ^  is 
positive  multiple  of  the  inequality  x{^A)  <  m. 

Now  to  complete  the  proof  of  this  case,  let  W  be  the  vertex-set  of  a  compo¬ 
nent  of  G{A)  and  let  F'  =  {x  :  x{E{W))  =  |Wl  -  1}.  F'  is  a  proper  face  of 
P  and  Fa  Q  F' .  As  above  we  conclude  that  Fj  =  F'  and  that  A  =  E{W). 

D 

The  previous  result  gives  a  defining  system  for  Pmst,  but,  as  the  last  part  of  the 
proof  suggests,  it  is  not  an  irredundant  system.  Define  a  graph  to  be  2- connected 
if  every  pair  of  edges  is  contained  is  a  common  circuit.  If  the  subgraph  induced  by 
E{W)  is  not  2-connected  for  some  subset  of  vertices  W,  then  where  Wi, . . . ,  Wit  are 
the  vertex-sets  of  the  “2-connected  components”  of  the  subgraph  on  E(W),  (iii)  for 
W  is  implied  by  (iii)  for  Wj  {j  =  l,...,k),  an  observation  that  follows  from  the 
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observation  that  a  forest  of  a  graph  is  maximeil  iff  it  is  a  spanning  tree  of  each  of 
the  2-connected  components  of  the  graph.  This  is  the  only  redundancy  that  occurs. 

Theorem  5.3.2  If  W  C  V(G)  and  E(W)  is  a  2-connected  subset  of  edges,  then 
the  inequality  a;((iJ(W^))  <  \W\  —  1  defines  a  facet  of  Pm st-  [1 


Exercises 


5.1  Show  that  exact  Gaussian  elimination  for  rational  linear  systems  is  polynomial. 

Remark'.  In  order  to  make  the  intent  of  the  problem  clear,  we  first  specify  what  is  meant 
by  Gaussian  elimination,  and  why  it  is  not  obviously  polynomial.  Let  Ax  =  b  be  an  n  x  n 
system  of  equations  involving  only  rational  coefficients,  and  assume  for  convenience  that 
A  is  nonsingular.  Suppose  that  an  /  0.  Then  eliminating  on  means  replacing  a,y  and 
bi  for  i  >  2  by  aiia,j  —  a,iaij  and  aubi  —  anbi,  respectively.  Repeating  this  elimination 
procedure  n  —  1  times,  following  each  elimination  by  a  possible  reordering  of  rows  to  insure 
that  the  “pivot  element”  is  nonzero,  is  what  we  call  Gaussian  elimination.  Clearly,  once  the 
procedure  is  completed,  solving  for  x  is  merely  a  matter  of  back  substitution. 

How  efficient  is  this  procedure?  The  number  of  arithmetic  operations  is  clearly  polyno¬ 
mial,  O(n^),  but  how  much  work  does  each  of  the  operations  take?  Since  we  have  said  that 
the  arithmetic  must  be  exact,  this  work  cannot  be  ignored.  Each  of  the  numbers  in  A  and 
b  is  rational,  and  so  can  be  taken  as  represented  as  a  ratio  of  two  integers.  The  size  of  each 
of  the  numbers  is  thus  the  total  number  of  digits  in  the  numerator  and  denominator  (this 
is,  after  all,  to  within  a  constant  how  much  space  it  takes  to  store  these  integers).  Now  if 
M  is  the  size  of  the  largest  of  the  a,j,  then  after  one  elimination,  the  size  of  the  largest 
coefficient  can  apparently  be  as  big  as  0{M^).  Thus,  the  number  of  digits  may  roughly 
double.  (Note  that  simply  normalizing  does  not  help  because  of  our  insistence  on  exact 
arithmetic.  Indeed,  this  would  seem  to  make  the  growth  worse.)  After  n  —  1  eliminations, 
the  number  of  digits  becomes  0(2"“^),  an  this  is  not  polynomial! 

Jack  Edmonds  was  the  first  to  prove  that  there  is  a  way  around  this  difficulty.  He 
observed  that  after  the  second  elimination  is  carried  out,  On  divides  every  entry  in  rows  3 
up  to  n.  One  can  prove  the  desired  result  by  verifying  Edmonds’  observation,  and  showing 
how  its  repeated  application  can  be  used  to  keep  the  number  of  digits  from  growing  too 
rapidly. 


5.2  Prove  Theorem  5.3.2:  If  14^  is  a  subset  of  vertices  of  a  graph  G  and  E{W)  is  a  2- 
connected  subset  of  edges,  then  the  inequality 

x{E{W))  <\W\-1 

defines  a  facet  of  Pmst-  (A  graph  is  2-connected  if  every  pair  of  edges  is  contained  in  a 
circuit.) 
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Outline  of  Proof:  It  is  sufficient  to  prove  the  result  when  G  has  no  loops.  Then  Pmst 
has  dimension  |i?|.  Let  I  be  the  family  of  all  edge-sets  of  forests  I  such  that  satisfies 
x\E{W))  =  |W|  —  1.  Let  A  be  a  matrix  the  rows  of  which  are  the  incidence  vectors  x^ 
(/  G  I).  Conclude  that  if  the  given  inequality  is  not  a  facet,  then  rank(A)  <  \E\  —  1.  Show 
that  for  any  maximal  forest  I  of  E{W)  and  any  element  /  G  E\E{W),  x^  U  {/}  is  a  row  of 
A. 

Now,  let  z  G  be  a  nonzero  vector  such  that  Az  =  0  (A  has  |£|  columns).  Note  that 
z  is  zero  outside  E{W).  Let  /i  =  {/  G  E{W)  :  z/  <  0}  and  /2  =  {/  G  E{W)  :  Zf  >  0}. 
Both  sets  are  nonempty  (A  has  no  zero  column).  Show  that  no  circuit  in  E{W)  intersects 
both  I\  and  /2,  a  contradiction.  □ 


Chapter  6 


Ellipsoids 


6.1  Overview 

We  present  a  restricted  version  of  the  ellipsoid  method,  Algorithm  6.3.7,  and  prove 
it  is  polynomial  time  bounded.  Then  we  present  a  result  of  Grotschel  Lovasz  and 
Schrijver  proving  that  optimization  is  equivalent  to  separation.  This  latter  result 
has  important  implications  for  combinatorial  optimization.  Our  development  is 
based  on  Chapter  13  of  [10],  and  in  part  on  [4],  pp.  443-454. 

The  ellipsoid  method  was  developed  over  a  period  of  years  by  several  Russian 
mathematicians  as  a  way  to  solve  general  nonlinear  programs,  and  convex  programs 
in  particular.  The  method  can  be  viewed  as  having  emerged  from  two  separate  lines 
of  development. 

In  1964  N.  Z.  Shor  presented  a  general  “subgradient  method,”  a  generalization 
of  what  has  come  to  be  known  as  a  “relaxation  method.”  In  this  method  a  feasible 
solution  of  a  system  of  inequalities  is  found  by  successively  projecting  onto  violated 
inequalities.  Shor  later  realized  (circa  1970)  that  his  method  could  be  improved 
by  appropriately  transforming  the  spzice  at  each  iteration  (an  idea  not  completely 
unrelated  to  Karmarkar’s  method  for  linear  programming). 

The  second  line  of  development  originates  with  work  by  A.  Ju.  Levin  in  1965  in 
which  he  discussed  a  method  of  “central  sections”  for  general  convex  programming. 
D.  B.  Judin  and  A.  S.  Nemirovskii  later  noticed  (1976)  that  if  ellipsoids  were  used 
in  Levin’s  method,  then  it  became  more  efficient  and  that  using  these  ellipsoids 
could  be  viewed  as  using  a  particular  choice  of  transformation  in  Shor’s  method. 
In  addition,  they  proved  that  for  a  certain  class  of  problems  the  ellipsoid  method 
could  be  used  to  approximate  the  optimal  solution  to  within  a  given  accuracy  a  in 
time  polynomial  in  the  “size”  of  the  problem  and  log  \/cr} 

^logn  stands  for  log2  n 
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Finally,  in  1979  Khachian  proved  that  for  LPs  with  integral  data  one  could  get 
an  exact  solution  in  polynomial  time.  It  was  this  result,  brought  to  the  attention 
of  the  western  mathematical  programming  community  at  the  1979  Oberwolfach 
meeting  in  Germany,  that  caused  such  a  stir.  It  solved  the  long-standing  problem 
of  finding  a  theoretically  efficient  algorithm  for  LPs.  The  method  has  not,  however, 
proved  effective  as  a  practical  method  for  solving  LPs.  Its  importance  for  us  is 
based  on  its  theoretical  implications  in  combinatorial  optimization. 

Our  development  of  the  ellipsoid  method  proceeds  as  follows.  The  method  is 
most  directly  viewed  as  a  method  for  testing  the  feasibility  of  systems  of  linear 
inequalities,  that  is,  as  a  method  for  testing  whether  a  polyhedron  P  =  {x  :  Ax  < 
6}  C  R"  is  nonempty.  Thus,  we  begin  by  showing  that  testing  feasibility  is  enough  to 
solve  LPs.  Having  made  this  reduction,  we  describe  the  ellipsoid  method  under  two 
restrictive  assumptions:  that  P  is  bounded  and  either  empty  or  full  dimensional, 
dimP  =  n.  These  assumptions  remove  certain  technical  difficulties,  maldng  the 
presentation  more  direct.  Their  relaxation  is  treated  in  the  exercises. 

6.2  Reduction  to  Testing  Feasibility 

The  first  step  in  the  reduction  is  to  apply  LP  duality  theory.  Consider  the  LP 

min  c^x 

(P)  s.t.  Ax  >  b 

X  >0 

The  dual  of  (P)  is  the  problem 

max  Wy 

(D)  s.t.  A'^'y  <  c 

y  >  0 

By  the  LP  strong  duality  theorem,  (4.1.2),  we  know  that  x*  is  an  optimal  solution 
of  (P)  iff  X*  is  feasible  for  (P)  and  there  exists  a  feasible  y*  for  (D)  such  that 
c^x*  =  h^y*.  But  a  simple  calculation,  (4.1.1),  shows  that  if  x*  and  y*  are  feasible 
to  (P)  and  (D),  respectively,  then  c^x*  >  6^y*.  We  conclude  that  testing  the 
feasibility  of  the  following  linear  inequality  system  is  equivalent  to  solving  (P)  (and 

(D)): 

Ax  >  b 

—A^y  >  —c 

—c^x  b^y  >  0 

X  >0 

y  >  0 
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We  carry  the  reduction  one  step  further  by  showing  that  testing  a  system  for 
solvability  in  polynomial  time,  and  finding  a  solution  if  one  exists,  is  equivalent 
to  simply  recognizing  solvable  systems  in  polynomial  time.  Thus,  we  show  that 
having,  say,  a  “subroutine”  or  “oracle”  that  recognizes  solvable  systems  implies  the 
existence  of  a  method  to  actually  construct  solutions. 

Consider  a  system  Ax  <  b  and  suppose  we  have  a  subroutine  that  recognizes 
solvability.  If  the  system  has  no  solution,  there  is  nothing  to  do.  Assume  the 
contrary.  We  then  perform  two  reductions  (the  second  reduction  actually  being  an 
expansion): 

Reduction  1.  Remove  columns  from  A,  and  the  corresponding  variables  from 
x,  until  any  further  removals  destroy  feasibility.  Denote  the  result  by  Ax  <  b, 
the  same  as  the  original. 

Reduction  2.  Expand  the  system  Ax  <  b  by,  for  eax:h  of  the  inequalities 
<  0  in  the  system,  adding  the  reverse  inequality  d^x  >  0  to  the  system, 
when  doing  so  preserves  feasibility.  Again,  denote  the  final  result  by  Ax  <  b. 

Cleaxly,  both  of  the  above  reductions  can  be  carried  out  with  polynomial  number 
of  calls  to  the  assumed  subroutine:  If  A  is  m  x  n.  Reduction  1  requires  at  most  n 
calls  and  Reduction  2  at  most  m  calls. 

Lemma  6.2.1  Let  A"x  —  b"  be  the  system  of  equations  corresponding  to  the  system 
of  inequalities  added  in  Reduction  2.  Then  A'*x  =  b"  has  a  unique  solution. 

Note  that  this  lemma  does  imply  the  desired  result.  It  implies  that  we  can  use  a 
subroutine  for  recognizing  solvability  to  reduce  the  problem  of  solving  an  inequality 
system  to  that  of  solving  some  equality  system.  But  we  know  a  method  to  solve 
equality  systems,  Gaussian  elimination,  and  this  method  runs  in  polynomial  time 
(as  was  demonstrated  in  exercise  5.1). 

Proof  of  (6.2.1).  First  we  prove  that  after  Reduction  1,  the  columns  of  the  matrix 
A  must  be  linearly  independent.  Assume  not.  Let  x  be  a  solution  to  Ax  <  6,  and 
let  be  any  dependent  column  of  A.  We  can  “compensate”  for  a^Xj  by  expressing 
in  the  form  =  A'z,  where  A'  is  A  with  deleted.  In  particular,  replacing  each 
component  xj,  of  x,  other  than  Xj,  by  y*.  =  Xjt  +  ZkXj,  we  obtain 

A'y  —  Ax  —  a^Xj  +  A' zxj  =  Ax  —  a^Xj  +  a^Xj  =  Ax  <  b, 

showing  that  could  have  been  deleted,  a  contradiction. 

Now  we  prove  that  A"x  =  b"  has  a  unique  solution.  If  the  rows  of  A"  span  the 
rows  of  A  this  will  follow,  because  then,  by  the  result  of  the  previous  paragraph, 
the  colmnns  of  A"  must  be  linearly  independent.  Suppose  that  A"  does  not  span 
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the  row  sp£ice  of  A.  Then  there  is  a  vector  c  orthogonal  to  all  the  rows  of  A  ,  but 
not  all  the  rows  of  A.  Clearly,  by  adding  an  appropriate  scalar  multiple  of  c  to  any 
solution  X  of  Ax  <  6,  we  preserve  A"x  =  h"  and  can  produce  equality  in  some  other 
inequality.  This  contradicts  the  maximality  of  [A"  b"].  [] 


6.3  Ellipsoids 

The  ellipsoid  method  may  be  viewed  as  a  kind  of  higher-dimensional  binary  search 
in  which,  instead  of  halving  an  interval  at  each  stage,  we  halve  an  ellipsoid.  In  more 
detail,  suppose  P  =  {x  :  Ax  <  6}  is  a  polyhedron  and  an  “ellipsoid”  E  containing 
P  is  given.  Then  either  the  “center”  c  of  P  is  in  P,  in  which  case  we  are  done— P 
has  been  shown  to  be  nonempty — or  c  violates  one  of  the  inequalities  a^x  <  of 
Ax  <  6.  In  the  latter  case  we  find  an  ellipsoid  E*  containing  {x  e  E  :  a^x  < 
and  show  that  E'  may  be  chosen  so  that  its  volume  is  less  than  the  volume  of  E 
multiplied  by  a  constant  factor  less  than  1,  dependent  on  the  dimension  n  of  the 
ambient  space  R",  but  independent  of  P  and  E.  This  gives  a  geometric  decrease  in 
the  volume  of  the  ellipsoids  generated.  Finally  we  give  a  “polynomial”  lower  bound 
on  the  volume  of  P,  assuming  it  is  nonempty.  If  P  is  nonempty  we  therefore  find 
that  the  center  of  a  containing  ellipsoid  must  be  in  P  before  the  volume  of  that 
ellipsoid  becomes  too  small. 

For  a  vector  x  €  R”.  define  ||i||  =  11*11  is  the  length  or  Endiimn  norm 

of  X.  Let  X  be  an  n  X  n  nonsingular  matrix  and  let  c  e  R"-  Then  an  dhpsotd  E 
with  center  c  is  a  set  of  the  form 

E  =  {x:11A(x-c)11<1}. 

Note  that  y  =  A(x-c)  iff  x  =  A'^y-l-c.  A  transformation  of  the  form  T(y)  =  By+d, 
where  P  is  an  n  x  n  matrix  and  d  G  R",  is  called  an  affine  transformation-,  if  B 
is  nonsingular,  then  T  is  a  nonsingular  affine  transformation.  It  folWs  that  an 
ellipsoid  is  the  image  under  a  nonsingular  affine  transformation,  T(y)  -  A  y  +  c, 
of  the  tinti  ball  in  R",  B"  =  {y  :  Hyll  ^  !}• 

It  will  be  convenient  here  to  introduce  an  alternative  definition  of  ellipsoid.  A 
positive  definite  matrix  P  is  a  matrix  of  the  form  D  =  AM  for  some  nonsingulax 
matrix  A.  We  define  the  ellipsoid  ell(c,  D)  with  center  c  by 

ell(c,  P)  =  {x  :  (x  -  cfD-\x  -  c)  <  1}. 

The  equivalence  to  the  previous  definition  is  immediate  from  the  definition  of  posi¬ 
tive  definite  matrix. 

The  crucial  result  for  showing  that  elUpsoids  can  be  used  to  treat  systems  of 
linear  inequalities  is  a  result  showing  that  any  “half  ellipsoid”  is  contained  in  an 
ellipsoid  the  “volume”  of  which  is  smaller  by  a  suitable  constant  multiple. 
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Perhaps  a  word  should  be  said  here  about  how  volume  is  defined.  We  need  only 
some  very  elementary  facts.  First,  it  is  clear  that  whatever  definition  we  choose,  it 
should  assign  volume  1  to  the  unit  cubes  In  general,  consider  the  parallelepiped  P 
spanned  by  a  set  of  vectors  a^, . . . ,  a”  in  R”,  that  is,  the  convex  hull  of  the  set  of  all 
vectors  achievable  as  sums  of  subsets  of  these  vectors — assuming  no  degeneracies, 
there  will  be  exactly  2”  such  sums,  including  the  origin  (which  we  take  to  be  the 
result  of  the  empty  sum).  Now  if  we  replace  one  of  these  vectors  by  aa^ ,  for  a 
scalar  a,  then  this  should  multiply  the  volume  of  P  by  la].  On  the  other  hand, 
if  we  replace  one  of  the  vectors  by  its  sum  with  some  other  vector  in  this  list,  say 
replacing  a*  by  j  ^  fc,  then  this  should  leave  the  volume  of  P  unchanged. 

Where  A  is  the  matrix  with  columns  a^,...,a",  it  is  easy  to  see  that  the  above 
conditions  imply  volP  =  |  det  A|. 


Proposition  6.3.1  If  A  is  an  n  'X.  n  matrix  and  P  C  R"  is  “well  behaved,  ”  then 
vol  A{P)  =  I  det  A\  vol  P.  [] 

Proof.  If  aff  P  ^  R",  or  A  is  singular,  clearly  vol  A{P)  =  0;  otherwise,  approximate 
P  as  a.  disjoint  union  of  cubes — that  this  is  possible  is  what  we  mean  by  “well 
behaved” — and  apply  the  conclusion  of  the  previous  paragraph.  [] 

We  actually  use  (6.3.1)  only  for  P  a  very  special  polyhedron  or  an  ellipsoid.  The 
applications  to  polyhedra  is  in  the  proof  of  Lemma  6.3.6. 

Corollary  6.3.2  volell(c,  I?)  =  N/detDvolB”. 

Proof.  Write  D~^  =  A'^A.  Then  ell(c, i?)  =  {x  :  ||i4(x  —  c)||  <  1).  It  follows  that 
ell(c,  D)  =  A“^(B")  +  c.  Hence  (6.3.1)  implies  volell(c,  D)  =  |  det  vol  B". 

A  second  proof,  not  using  (6.3.1),  runs  as  follows.  Since  D  is  positive  definite, 
it  can  be  written  in  the  form  D  =  W^RU,  where  U  is  orthogonal  {If^U  =  I)  and  R 
is  diagonal.  It  should  be  clear  that  an  orthogonal  matrix,  the  application  of  which 
simply  rotates  the  space,  does  not  change  volume,  and  that  applying  a  diagonal 
matrix  multiples  volume  by  the  absolute  value  of  its  determinant.  [] 


Theorem  6.3.3  Let  E  =  ell(c,  D)  be  an  ellipsoid  with  center  c,  and  let  a  6  R”  be 
nonzero.  Let  H  =  {x  :  a^x  <  a^c},  and  let  E'  =  ell(c',  D')  where 


c  =  c  — 


n  4- 1  y/a'^Da' 


D'  = 


—  1 


2 

Daa'^D 

n  +  1 

a^Da 

^ We  take  here  as  unit  cubes  any  set  that  has  the  form  I\  x  •  •  •  x  7„ ,  where  each  Ij  € 
{[0,1], (0,1], [0,1), (0,1)}  (i=l,...,n). 
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Then  H  H  E  Q  E'  and  ^ 

volE'  <  voliJ. 


Proof.  In  order  to  simplify  the  details  of  the  proof,  we  apply  two  nonsingular  affine 
transformations  that  have  the  effect  of  reducing  c  to  0,  to  a  unit  ball,  and  H  to 
{x  ;  Xi  <  0}.  The  success  of  this  approach  relies  in  the  end  on  the  fact  that  when 
applied  to  both  E  and  E',  these  transformations  preserve  preserve  ellipsoids  and 
preserve  ratios  of  volumes,  the  latter  because  of  (6.3.1). 

Let  Ti  is  the  affine  transformation  Ti(x)  =  A{x  —  c)  where  D~^  =  A^ A  and  A 
is  nonsingular.  Then  Ti  is  nonsingular,  Tx{E)  =  and  Ti{E')  =  ell(ci,I>i)  where 

_ _ 1 _ ^ 

n  +  1  y/Fb' 


Di  = 


2  bb^' 
n  +  1  6^6 


and  b  =  (A^)“^a.  This  is  verified  by  substituting  x  =  A  ^j/  +  c  in  the  definition  of 
ell(c',Z>').  Clearly  Ti{H)  =  {x  :  Fx  <  0).  Note  also  that  Di  =  AD'A^. 


For  the  second  transformation,  let  17  be  an  orthogonal  matrix  such  that  Ub  =  ad, 
where  =  [1  0 ...  0]  and  a  >  0.  (Such  a  17  can  be  obtained  by  applying,  say,  the 
Gram-Schmidt  process  to  a  basis  for  R”,  the  first  vector  of  which  is  6.)  Define 
the  affine  transformation  T2(x)  =  Ux  and  define  T  =  r2Ti.  Since  Ti  and  T2  are 
nonsingular,  so  is  T,  and  it  is  straightforward  to  verify  that  T{H)  =  {x  :  xi  <  0}, 
T(E)  =  B"  and  T{E')  =  ell(c2,D2)  where 


D2 

Note  that  £>2  =  UAD'A^W^. 


-^d(f 

n  +  1 


Since  D2  is  evidently  diagonal  and  has  positive  entries  on  the  diagonal,  it  follows 
that  D'  is  positive  definite.  Now  to  see  that  {x  €  B"  ;  xi  <  0}  =  T{E)  fl  T{H)  C 
ell(c2,D2)  =  T{E')  involves  a  straightforward  calculation. 

To  complete  the  proof  we  need  to  estimate  volE'lvolE.  Since,  by  (6.3.1),  T 
preserves  ratios  of  volumes,  we  have 

vol  E'  vol  ell(c2,  D2) 
vol  E  vol  B" 


But  by  (6.3.2), 


volell(c2,  D2) 
volB” 


1^1  det  D2I 
n 

n  +  1  —  1 


n— 1 
2 
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Using  the  fact  that  >  1  -f  i  for  2^0,  we  have 


volE' 

volE 


e  n+i  )  2 


as  required.  [] 

To  continue  further  we  must  be  able  to  construct  a  starting  ellipsoid,  and  get 
a  lower  bound  on  the  volume  of  the  given  polyhedron.  In  both  cases  we  do  this 
by  bounding  the  complexity  of  the  solutions  of  systems  of  hnear  equations,  and 
this  in  turn  we  do  by  bounding  the  complexity  of  determinants.  Precise  notions  of 
complexity  for  rational  numbers  are  essential  here. 

Let  r  =  p/q  ^  Q  be  a  rational  number  where  p  and  q  are  relatively  prime 
integers.  Define 

size(r)  =  1  +  \\og{\p\  +  1)]  +  [logdgl  +  1)], 

where  [p]  is  the  least  integer  greater  than  or  equal  to  p.  Thus,  size(r)  is  a  boimd  on 
the  number  of  binary  digits  needed  to  represent  r.  Now  let  c  G  Q"  be  an  n-vector 
and  A  G  an  m  x  n  matrix.  Define 

size(c)  =  n  +  size(ci)  +  . . .  +  size(cn), 

size(A)  =  mn  +  size(aii)  +  . . .  +  size(a,„„). 

Proposition  6.3.4  size(det  A)  <  2size(A). 

Proof.  As  the  result  is  trivial  for  1x1  matrices,  we  may  assume  A  is  n  x  n, 
n  >  1.  Let  Cij  =  Pij/qij,  where  and  qij  are  relatively  prime.  Let  a  =  size(A)  and 
det  A  =  p/q,  again  relatively  prime.  We  have  the  following  inequalities: 

IdetAl  <I|(|p.d  +  l), 

kl  ^  Ilkul  < 

iyj 

\p\  <  I  det  A|  l^l  <  n(IPul  +  l)kul  <  2"“^ 

The  first  inequality  is  proved  by  a  simple  induction  using  the  faet  that  Oi  +  . .  .+afc  < 
nj=i(|Q:j|  +  1)  for  any  scalaxs  ai,...,afc.  The  third  line  of  inequalities  uses  the 
first  inequality,  together  with  the  fact  that  n  >  1  to  get  the  last  strict  inequality. 
Combining  the  last  two  lines  proves  the  proposition.  [] 

The  next  step  is  to  deal  with  “vertex  complexity.” 
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Proposition  6.3.5  If  P  =  {x  :  Ax  <  b}  ^  0  and  a  >  size([a^  j3])  for  all  rows 
[a^  I3]  of  [A  ft],  then  the  complexity  of  an  extreme  point  of  P  is  at  most  4n^cr. 

Proof.  By  Proposition  4.1.3,  «  is  an  extreme  point  of  P  iff  z  is  the  unique  solution 
of  some  subsystem  A'x  =  h'  of  Ax  =  h.  Applying  Cramer’s  rule,  we  see  that  each 
component  Zj  of  z  is  given  by  an  expression  of  the  form  det  B/  det  A',  where  B  is  A' 
with  the  j**'  column  replaced  by  h.  By  hypothesis,  size(P)  <  n<7  and  size(A')  <  n<7. 
Hence,  size(detP)  <  2na  and  size(det  A')  <  2n<T.  It  follows  that  size(zj)  <  4n(T  —  1, 
and  so  size(2)  <  n  +  n(4n<T  —  1)  =  4n'^a.  [] 

We  need  one  more  preliminary  result,  a  method  for  bounding  the  volume  of  a 
polyhedron  from  below.  To  this  end  we  give  an  exact  formula  for  the  volume  of  one 
very  special  polyhedron; 

Proposition  6.3.6  Let  e  R”.  Then 

vol  conv{a:^ , . . . , 

n! 

Proof.  Consider  Xn(Q!)  =  conv{0, Q!e^,...,ae’*}  where  is  the  unit  vector  in 
R".  We  claim  that  vo1A’„(q!)  =  a^/n!.  This  is  certainly  true  for  n  =  1.  Assume  it 
is  true  for  a  general  n.  Then  by  induction, 

^  N  r  («  -  a:)"  j 

Xn+i{a)=  - - j-^dx, 

Jo  n! 

which  yields 

volX„„(a)  =  j^. 

Now  let  X  =  conv{0,  . . .  ,x"}  C  R"  and  let  A  =  [x^...x"]  be  an  n  x  n 
matrix.  Then  clearly  X  =  A(A’„(1)),  and  so  by  Proposition  6.3.1  and  the  result  of 
the  previous  paragraph,  we  have 

vol  A"  =  I  det  A|  vol  A„(l)  =  . 


1 

1 

det 

x'  ■ 

Let  .  We  can  now  prove  the  proposition: 


|det[y^...y"+^]|  =  |  det[y^  y^  -  y^ . . .  y’*+^  -  y^]| 

=  |det[x^  — . . .  x’^'*'^  —  x^]| 

=  n!  vol  conv{0,  x^  —  x^ , . . . ,  ~  2:^ } 

=  n!  volconv{x^,  x^, . . . ,  x”"*"^}.  [] 
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We  can  finally  state  and  prove  the  validity  of  a  restricted  version  of  the  ellipsoid 
method.  Note  that  the  statement  of  the  algorithm  includes  the  use  of  a  operator 
(•)p  that  truncates  its  argument  to  p  binary  digits  beyond  the  decimal  point.  We  are 
forced  to  include  some  such  operator  if,  for  no  other  reason,  because  of  the  presence 
of  a  square  root  in  the  update  formula  for  c.  This  unfortunately  complicates  the 
proof  of  validity,  and  necessitates  several  technical  lemmas.  We  do  not  prove  them 
here,  referring  the  reader  rather  to  [10].  In  each  case  they  involve  estimates  of  how 
the  eigenvalues  of  the  matrices  D  vary  imder  the  truncation. 


Algorithm  6.3.7  The  Restricted  Ellipsoid  Algorithm 

Input:  An  integer  a  and  a  bounded  polyhedron  P  =  {r  :  Ax  <  6}  C  R"  specified 
by  a  rational  matrix  [A  b]  G  such  that  size([a^  P])  <  and  a  0  for  each 

row  [a^  /3]  of  [A  6].  We  assume  that  if  P  ^  0,  then  P  is  full  dimensional. 

Output:  The  assertion  that  P  =  0,  or  a  point  c  S  P. 

Comment:  Suppose  P  ^  0.  In  the  unrestricted  version  of  the  algorithm,  in  which 
the  assumption  of  full  dimensionality  (and  boundedness)  is  removed,  termination 
does  not  occur  yielding  a  point  c  G  P,  but  only  with  the  assertion  that  P  ^  0 — see 
exercise  6.2.  A  point  in  P  can  then  be  found  using  the  results  of  §6.2. 


begin 

u  ;=  4n^<r  ; 

N  :=  32n2i/; 
p:=  5Ar2; 

M  :=  2*'; 

D  :=  M^I  where  /  is  an  n  x  n  identity  matrix; 
c  :=  0  G  R”; 

if  Ac  <  6  then  output  c  and  stop; 
for  j  :=\  until  N  do  begin 

let  [a^  6]  be  a  row  of  [A  6]  such  that  a^c  > 

c-=  U  ^  \  • 

n+lVo^To/p’ 


D  := 


2  Daa^D' 
n  +  1  a^Da 


'  I  p 

comment  (■)p  truncates  to  p  binary  digits  beyond 
the  decimal  point; 
if  Ac  <  6  then  output  c  and  stop; 
end 

output  the  assertion  that  P  =  0; 
end 


Theorem  6.3.8  The  restricted  ellipsoid  algorithm  is  correct  and  runs  in  time  poly¬ 
nomial  in  the  size  of  the  input. 
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Proof.  In  verifying  that  the  algorithm  will  actually  run,  that  is,  that  all  the 
steps  are  well  defined,  the  only  nontrivial  part  is  to  show  that  a^Da  >  0  at  each 
iteration.  But  as  part  of  proving  Theorem  6.3.3,  we  proved  that  if  the  matrix  D 
given  in  statement  of  the  theorem  is  positive  definite,  then  so  is  D\  and  given  the 
choice  of  p  in  the  algorithm.  Theorem  13.2  of  [10]  then  implies  {D')^  is  positive 
definite^.  Since  the  initial  D  in  (6.3.7)  is  obviously  positive  definite,  it  follows  that 
all  subsequent  D  axe  (the  update  formula  for  D  being  exactly  the  corresponding 
formula  from  (6.3.3)).  But  D  positive  definite  and  a  ^  0  evidently  implies  a^Da  >  0. 

As  the  next  step  in  the  proof,  we  observe  that  if  the  algorithm  terminates  with 
the  assertion  that  c  G  P,  then  this  is  obviously  correct.  Suppose  the  algorithm 
terminates  with  the  assertion  that  P  =  0,  but  in  fact  P  ^  0.  We  have  assiimed 
that  P  is  bounded,  and  so  it  is  the  convex  hull  of  its  extreme  points.  By  (6.3.5) 
these  extreme  points  have  size  bounded  by  i/  =  4n^<7.  Let  Eq  =  ell(0,  M^/)  =  MB", 
and  let  Ej  =  ell(c,  D)  for  the  c  and  D  obtained  after  the  application  of  the  for 
loop  in  (6.3.7)  assuming  no  truncation  is  performed.  Clearly  P  C  Eq  since  Eq  is 
convex  and  all  extreme  points  of  P  are  in  Eq — they  all  have  norm  at  most  M. 

Now  by  (6.3.5)  and  (6.3.6) 

volP  >  >  2■^"^ 

and  since  =  B"  C  [— 1,  Ij",  we  have 

volEo  <  2”M"  <  2"2"*'  <  2^'‘^ 

Now  applying  the  above  inequalities  and  (6.3.3)  yields 

vol  En  <  =  2“®"" 

Were  it  not  for  truncation,  this  would  complete  the  proof  since  it  imphes  that  the 
volume  of  En  is  smaller  than  that  of  P,  while  at  the  same  time  P  C  En-  Indeed 
N  =  16n^i/  in  the  statement  of  the  algorithm  suffices  for  this.  For  the  version 
that  includes  truncation  and  for  N  =  Z2n^v,  as  stated  in  the  algorithm,  Theorems 
13.2  and  13.3  of  [10]  can  be  used  to  show  that  voIEn  <  2"M"e“^/*",  and  that 
P  C  ell(c,  4i?)  where  En  =  ell(c,  D)  and  c  and  D  are  the  final  (truncated)  c  and  D 
produced  by  the  algorithm.  But  then,  using  (6.3.2), 

volP  <  volell(c,4I>)  =  2^^  vol  En  <  e-^"^ 
a  contradiction.  This  proves  the  theorem.  [] 

^This  proof  uses  the  facts,  consequences  of  the  formulas  in  (6.3.3),  that  the  maximum  eigenvalue 
of  O'  is  at  most  4  times  the  maximum  eigenvalue  of  D  and  the  minimum  eigenvalue  at  least  1/4 
times  the  minimum  eigenvalue. 
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6.4  Equivalence  of  Optimization  and  Separation 

As  currently  stated,  (6.3.7)  requires  explicit  knowledge  of  a  defining  system  Ax  <  b 
for  the  given  polyhedron  P.  However,  even  for  some  very  simple  CO  problems, 
such  as  the  MST  problem  (see  Theorem  5.3.1),  the  size  of  such  a  system  must 
necessarily  be  exponential  in  the  number  of  variables.  What  is  needed  is  a  version 
of  (6.3.7)  in  which  A  can  be  treated  in  some  implicit  way.  That  such  a  version  is 
possible  was  observed  by  Grotschel,  Lovasz  and  Schrijver  (1981)  [“The  Ellipsoid 
Method  and  its  Consequences  in  Combinatorial  Optimization,”  Combinatorica  1 
169-197— corrigendum:  4  (1984)  291-295],  Karp  and  Papadimitriou  (1982)  [“On 
linear  characterizations  of  combinatorial  optimization  problems,”  SIAM  Journal  on 
Computing  11  620-632],  and  Padberg  and  Rao  (1980)  [“The  Russian  method  and 
integer  programming,”  EGA  Working  paper.  New  York  University,  New  York  (to 
appear  in  Annals  of  Operations  Research)]. 

The  following  modified  “oracle”  version  of  the  ellipsoid  algorithm  has  the  above 
property.  It  is  the  same  as  (6.3.7)  except  for  the  ^InpuP  and  two  smaller  changes 
necessitated  by  this  change:  The  two  lines  in  the  algorithm  where  we  check  Ac  <  b 
must  be  replaced  by  calls  to  SEP,  and  the  statement  ‘let  [a^  /3]  be  a  row  of  [A  ft] 
such  that  a^c  >  0','  must  be  correspondingly  modified. 

Algorithm  6.4.1  Oracle  Version  of  the  Restricted  Ellipsoid  Algorithm 

Input:  An  integer  <t  and  a  bounded  polyhedron  P  C  R"  specified  by  a  “separa¬ 
tion”  subroutine  or  “oracle”  SEP  such  that  for  each  j/  €  Q",  SEP{y)  runs  in  time 
polynomial  in  size(y)  and  either  asserts  that  y  6  P,  or  returns  a  rational  inequality 
a^x  <  /3  <  a^y  valid  for  P  such  that  size([o^  /?])  <  a  and  a  ^  0.  We  assume  that  if 
P  ^  0,  then  P  is  full  dimensional. 

Output:  The  assertion  that  P  =  0,  or  a  point  y  E  P. 

begin 

1/  :=  4n^<T  ; 

N  :=  32n^v; 

p  :=  5A3; 

M  :=  2"; 

D  :=  M^I  where  /  is  an  n  x  n  identity  matrix; 

c  :=  0  G  R"; 

if  SEP(c)  asserts  that  c  E  P  then  output  c  and  stop; 

for  j  :=  1  until  N  do  begin 

let  [a^  /?]  be  the  inequality  returned  by  SEP{c), 
valid  for  P  but  violated  by  c  (a^c>/3); 

1  Da  \ 

n-\-\  y/aJDa) p 
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D  := 


n* 


D- 


2 

Daa^D' 

n  -b  1 

a^Da 

if  SEP{c)  asserts  c  €  P  then  output  c  and  stop; 
end 

output  the  assertion  that  P  =  0; 


end 


It  is  a  straightforward  exercise  to  check  that  the  proof  of  (6.3.8)  remains  valid 
for  this  modified  algorithm  In  particular,  note  that  Proposition  6.3.5  can  still  be 
applied  since  this  result  does  not  depend  on  how  many  rows  the  defining  system 
has,  only  on  the  complexity  of  these  rows.  Note  also  that  because  of  the  truncation, 
size(c)  is  bounded  by  a  polynomial  in  cr,  and  so  the  calls  to  SEP(c)  run  in  poly¬ 
nomial  time.  Finally,  observe  that  (6.4.1)  can  be  used  to  optimize.  The  approach 
given  in  §2  is  no  longer  valid  when  a  defining  system  is  not  explicitly  given.  One 
alternative  is  to  replace  it  by  binary  search  on  objective  function  values.  For  a, 
say,  minimization  problem,  after  having  foimd  one  feasible  point,  we  have  an  upper 
bound  on  the  optimal  value.  But  we  also  have  a  lower  bound,  because  P  C  JV/B”. 
Using  these  bounds,  we  can  get  within  any  desired  number  q  of  digits  of  accuracy 
using  a  number  of  calls  to  (6.4.1)  polynomial  in  q.  Now,  by  adding  an  appropriate 
perturbation  to  the  objective  function,  a  perturbation  with  polynomailly  bounded 
size,  we  can  also  arrange  that  the  optimal  solution  is  unique  and  that  any  solution 
optimal  for  the  new  objective  is  optimal  for  the  original  objective.  But,  since  we 
have  a  bound  on  the  number  of  digits  required  to  specify  any  extreme  point  of  P,  by 
insisting  that  q  is  large  enough,  we  can  “guess”  what  that  extreme  point  is.  Thus, 
we  can  find  the  exact  optimal,  the  exact  optimal  to  the  original  problem. 

We  might  summarize  the  content  of  the  modified  oracle  version  of  (6.3.7)  as 
showing  that  if  we  can  solve  “sepaxation”  in  polynomial  time  for  a  polyhedron, 
then  we  can  solve  (linear)  optimization  for  this  polyhedron.  As  our  final  result  in 
this  section  we  prove  a  (simplified  version  of  a)  converse  due  to  Grotschel,  Lovasz 
and  Schrijver  (see  the  1981  paper  referenced  in  the  first  paragraph  of  this  section). 
The  proof  involves  a  kind  of  “polarity.”  Our  version  of  this  proof  is  brief,  in  the 
hopes  that  the  main  idea  is,  in  this  way,  not  obscured  by  technical  details.  A 
complete  treatment  seems  to  encounter  such  details,  in  abundance,  at  every  turn. 

^The  reader  may  ask  why  we  have  not  stated  the  ellipsoid  algorithm  in  the  above  form  at  the 
outset,  given  that  the  proof  works  in  essentially  the  same  form?  The  principal  reason  is  the  difficulty 
in  dealing  with  the  full-dimensionality  assumption  in  an  oracle  setting.  In  the  exercises  we  have 
indicated  a  straightforward  method  when  the  polyhedron  is  specified  by  an  explicit  linear-inequality 
system.  However,  when  it  is  not,  then  something  more  complicated  must  be  done.  Papadimitriou 
and  Karp  resolve  the  issue  by  first  assuming  an  appropriate  bound  on  the  size  of  the  output  of 
SEP  (we  have  assumed  this  as  well  in  our  modified  statement  of  the  algorithm).  Grotschel,  Lovasz 
and  Schriver  show,  however,  that  this  is  not  necessary,  that  only  a  bound  on  “vertex  complexity” 
is  needed.  Indeed,  they  show,  by  use  of  basis  reduction  techniques  due  to  Lovasz,  that  a  modified 
version  of  the  algorithm  will  suffice  in  this  case,  if  we  are  willing  to  apply  it  n  times! 
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Definition  6.4.2  Given  a  set  P  C  R",  we  define  the  polar  P*  of  P  by  P*  =  {x  : 
x^y  <  IVy  €  P}. 

Note  that  the  polar  of  a  polyhedron  is  a  polyhedron  by  Theorem  4.1.4. 

We  state  the  following  result  only  for  polyhedra,  although  it  is  valid  for  any 
closed  convex  set  in  R”.  The  proof  for  general  convex  sets  requires  a  separating- 
hyperplane  theorem.  For  polyhedra,  these  hyperplanes  are  given  in  the  definition, 
and  the  result  is  easy. 

Lemma  6.4.3  Let  P  =  {x  :  Ax  <  6}  for  b  €  R"  and  A  an  m  x  n  matrix.  Then 
P”  =  P  €  P.  (P**  denotes  (P*)*.) 

Proof.  The  necessity  of  the  condition  0  €  P  is  clear:  The  polar  of  any  set  contains 
0.  To  prove  the  converse,  first  note  that  P  C  P**.  Indeed,  every  vector  in  P*  has 
inner  product  at  most  1  with  every  vector  in  P,  which  is  the  same  as  saying  that 
every  vector  in  P  has  inner  product  at  most  1  with  every  vector  in  P*.  Suppose 
P  ^  P**-  Then  there  exists  z  6  P**\P.  But  z  ^  P  implies  there  is  a  row  [a’'  0\ 
of  [A  6]  such  that  a^z  >  Note  that  >  0,  since  0  €  P.  Suppose  >  0.  Then 
apparently  a'  =  a/^  €  P*,  contradicting  the  fact  that  z  6  P**  since  =  1. 

Hence,  =  0.  Then  a^z  =  ^  >  0.  Let  a!  —  2a/^.  Then  a'  €  P*,  since  for  x  €  P, 
a^x  <  =  0,  and  so  x'^a'  <  0  <  1.  But  z^o'  =  26/6  =  2  >  \,  again  contradicting 

the  fact  that  z  €  P**.  [] 

For  a  set  X  C  R"  define  the  interior  of  X,  int  X,  to  be  the  set  of  all  x°  €  X 
such  that  for  some  e  >  0,  ||x  —  x°|l  <  e  implies  x  6  X. 

Lemma  6.4.4  If  P  Q  R”  w  bounded,  full  dimensional  and  0  G  int  P,  then  P*  is 
bounded  and  full  dimensional. 

Proof.  The  assumptions  of  the  lemma  imply  that  for  some  ei,  £2  >  0,  CPC 
e2B",  which  implies  that  l/e2B”  C  P*  C  1/eiB".  [] 

Theorem  6.4.5  (Grotschel,  Lovasz,  Schrijver)  Let  P  C  R”  be  a  bounded  polyhe¬ 
dron  with  a  given  integer  a  hounding  the  complexity  of  any  extreme  point  of  P. 
Assume  that  if  P  ^  0  then  P  is  full  dimensional,  and  suppose  that  an  oracle  OPT 
is  given  such  that  for  each  c  G  Q": 

OPT{c)  solves  the  program  max{c’’x  :  x  G  P}  in  time  polynomial  in  a  and 
size(c),  that  is,  OPT{c)  either  declares  that  P  =  0  or  gives  an  optimal 
solution  X*. 
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Then  in  time  polynomial  in  a  we  can  construct  a  algorithm  SEP  such  that  for  each 
a  e  Q"; 

SEP(a)  either  declares  that  a  £  P,  or  gives  a  vector  c  6  Q"  and  a  scalar 
G  Q  such  that  c^x  <  /3  for  all  x  E  P  and  c^a  >  /3;  moreover,  SEP(a) 
runs  in  time  polynomial  in  size(a)  and  a. 

Proof.  If  P  =  0,  then  SEP  is  easy  to  construct.  We  can  test  whether  this  is 
the  case  by  solving  max{0^a:  :  x  6  P}  using  OPT(O).  In  the  case  that  P  ^  0 
we  solve  2n  additional  programs  max{c^x  :  x  €  P}  where  c  ranges  over  the  2n 
unit  vectors  in  R"  and  their  negatives.  Let  x^,...,x^"  be  the  solutions  of  these 
programs  and  let  x°  =  x')/2n.  Then  x°  G  intP.  Define  Q  —  (P  —  x°)*.  Note 

that  0  G  int(P  —  x°)  so  that  Q*  =  P  —  x^  and  Q  is  a  bounded,  full-dimensional 
polyhedron,  by  (6.4.3)  and  (6.4.4);  moreover,  note  that  we  have  a  bound  on  the 
complexity  of  the  extreme  points  of  Q,  because  of  Proposition  6.3.5  and  the  fact 
that  the  extreme  points  of  defining  system  of  inequalities  for  Q  (with  right-hsuid-side 
all  Is).  This  bound  is  polynomial  in  <t. 

We  construct  SEP*  that  solves  separation  on  Q  in  polynomial  time.  This  will 
complete  the  proof  since  it  will  imply,  using  the  ellipsoid  algorithm,  that  we  can 
construct  OPT*  for  Q,  and  hence,  repeating  the  argument,  SEP  for  Q*  =  P  —  x°. 
Obviously,  knowing  SEP  for  P  —  x°  is  the  same  as  knowing  it  for  P. 

Now  let  us  consider  the  construction  of  SEP*  for  Q.  Let  G  Q”,  and  consider 
the  program  max{ty^x  :  x  G  P}.  Let  x*  be  an  optimal  solution  of  max{iy^x  :  x  G 
P}.  There  are  two  cases. 

Case  1.  Suppose  w'^x*  <  w^x°  -f- 1.  Then  w^x  <  w^x°  +  1  for  all  x  G  P,  that 
is,  w^(x  —  x°)  <  1  for  all  X  G  P.  Hence,  w  E  Q. 

Case  2.  Suppose  w^x*  >  w^x^-fl.  Then  x*  G  P  implies  that  y^{x*  —  x°)  <  1 
for  all  y  G  Q,  and  yet  w'^(x*  —  x°)  >  1.  Thus,  we  have  found  a  hyperplane 
separating  w  from  Q.  [] 

An  important  zispect  of  (6.4.5)  is  the  way  in  which  it  makes  concrete  the  con¬ 
nection  between  finding  a  polyhedral  description  and  finding  an  algorithm  for  a 
combinatorial  problem.  The  ellipsoid  algorithm  itself  makes  a  precise  statement 
about  how  understanding  the  polyhedron  associated  with  a  combinatorial  problem 
can  lead  to  solving  it.  The  above  result  proves  that  in  a  sense  these  are  equivalent 
problems:  If  we  can  find  an  algorithm,  then  implicitly  we  can  find  a  good  description 
of  the  polyhedron. 

Theorem  6.4.5  also  has  important  concrete  algorithmic  applications.  For  exam¬ 
ple,  an  important  problem  in  combinatorial  optimization  is  that  of  minimizing  a 
“submodular  function.”  The  only  known  polynomial-time  algorithm  for  doing  that 
uses  (6.4.5).  An  outstanding  open  problem  is  to  find  a  direct  algorithm. 
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Exercises 


0.1  In  the  statement  of  the  Restricted  Ellipsoid  Algorithm,  show  how  to  remove  the  bound¬ 
edness  assumption.  This  may  be  done  by  finding  a  bounded  polyhedron  P'  such  that 


P  n  P'  =  0  iff  P  =  0. 


6.2  Let  P  =  {x  :  Ax  <  5}  where  A  is  an  m  x  n  rational  matrix,  and  let  6  — 
where  i/  =  4n^  size([A  6]).  Let  =  [^ . .  .^]  6  R”*.  Let  =  {x  :  Ax  <  b  +  p).  Prove  that 
P^  =  0  iff  P  =  0.  Conclude  that  the  full  dimensionality  assumption  on  P  in  the  Restricted 
Ellipsoid  Algorithm  can  be  removed. 

Hint:  Make  use  of  the  following  statement  of  the  Farkas  Lemma,  deducible  from  the  Strong 
Duality  Theorem,  (4.1.2):  Let  A  be  an  m  x  n  matrix  and  let  b  6  R”*.  Then  Ax  <  b  has  a 
solution  X  iff  y^b  >  0  for  each  vector  y  >  0  with  y^A  =  0.  Note  that  in  this  statement,  at 
most  n  components  of  y  need  be  positive. 
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Notes:  [1],  [8]  and  [11]  are  written  by  computer  scientists.  [2]  is  an  introduction 

to  matroid  theory,  a  topic  not  discussed  in  any  detail  in  these  notes.  [4]  is  an 


65 


66 


BIBLIOGRAPHY 


excellent  text  on  linear  programming.  [5],  written  in  German,  describes  an  extensive 
collection  of  applications,  as  well  as  giving  an  introduction  to  most  of  the  important 
topics  in  combinatorial  optimization.  [6]  and  [9]  are  two  standard  references  on 
combinatorial  optimization.  [6]  contains  a  particularly  thorough  chapter  on  shortest 
paths.  The  emphasis  in  [9]  is  on  the  “primal-dual  approach.”  [10]  is  an  excellent 
reference  work  on  many  aspects  of  combinatorial  optimization,  especially,  so  far  as 
it  concerns  these  notes,  on  polyhedral  combinatorics  and  the  ellipsoid  method. 


